Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Change Data Streaming Patterns For Microservices With Debezium (Apache Kafka Meetup Hamburg)

Change Data Streaming Patterns For Microservices With Debezium (Apache Kafka Meetup Hamburg)

Apache Kafka has become the de-facto standard for asynchronous event
propagation between microservices. Things get challenging though when
adding a service’s database to the picture: How can you avoid
inconsistencies between Kafka and the database?

Enter change data capture (CDC) and Debezium. By capturing changes
from the log files of the database, Debezium gives you both reliable
and consistent inter-service messaging via Kafka and instant
read-your-own-write semantics for services themselves. Join this
session to learn how to leverage CDC for reliable microservices
integration and solving typical challenges such as gradually
extracting microservices from existing monoliths, maintaining
different read models in CQRS-style architectures, enabling streaming
queries on your operational data, updating caches as well as full-text
indexes and more.

You’ll find out how Debezium streams all the changes from datastores
such as MySQL, PostgreSQL, SQL Server and MongoDB into Kafka and how
you can react to the change events in near real-time. In a live demo
we'll show how to set up a change data stream out of your
application's database, without any code changes needed.

Gunnar Morling

February 12, 2019
Tweet

More Decks by Gunnar Morling

Other Decks in Programming

Transcript

  1. Change Data Streaming Patterns for Change Data Streaming Patterns for

    Microservices With Debezium Microservices With Debezium Gunnar Morling Gunnar Morling @gunnarmorling @gunnarmorling
  2. Gunnar Morling Gunnar Morling Open source software engineer at Red

    Hat Debezium Hibernate Spec Lead for Bean Validation 2.0 Other projects: Deptective, MapStruct [email protected] @gunnarmorling http://in.relation.to/gunnar-morling/ #Debezium @gunnarmorling
  3. Change Data Capture Change Data Capture What is it about?

    What is it about? Get an event stream with all data and schema changes in your DB #Debezium @gunnarmorling Apache Kafka DB 1 ?
  4. CDC Use Cases CDC Use Cases Data Replication Data Replication

    Replicate data to other DB Feed analytics system or DWH Feed data to other teams #Debezium @gunnarmorling Apache Kafka DB 1 DB 2
  5. CDC Use Cases CDC Use Cases Others Others Auditing/Historization Update

    or invalidate caches Enable full-text search via Elasticsearch, Solr etc. Update CQRS read models UI live updates Enable streaming queries #Debezium @gunnarmorling
  6. Debezium Debezium Change Data Capture Platform Change Data Capture Platform

    Retrieves change events from TX logs from different DBs Transparent to writing apps Comprehensive type support (PostGIS etc.) Snapshotting, Filtering etc. Fully open-source, very active community Latest version: 0.9 (based on Kafka 2.0) Production deployments at multiple companies (e.g. WePay, Trivago, BlaBlaCar etc.) #Debezium @gunnarmorling
  7. Advantages of Log-based CDC Advantages of Log-based CDC Tailing the

    transaction log Tailing the transaction log All data changes are captured No polling delay or overhead Transparent to writing applications and models Can capture deletes Can capture old record state and further meta data Different formats/APIs, but Debezium deals with this #Debezium @gunnarmorling
  8. Change Event Structure Change Event Structure Key (PK of table)

    and Value Payload: Before state, After state, Source info Serialization format: JSON Avro (with Confluent Schema Registry) { "schema": { ... }, "payload": { "before": null, "after": { "id": 1004, "first_name": "Anne", "last_name": "Kretchmar", "email": "[email protected]" }, "source": { "name": "dbserver1", "server_id": 0, "ts_sec": 0, "file": "mysql­bin.000003", "pos": 154, "row": 0, "snapshot": true, "db": "inventory", "table": "customers" }, "op": "c", "ts_ms": 1486500577691 } } #Debezium @gunnarmorling
  9. Debezium Connectors Debezium Connectors MySQL Postgres MongoDB SQL Server Oracle

    (Tech Preview, based on XStream) Possible future additions Cassandra? MariaDB? @gunnarmorling #Debezium
  10. #Debezium @gunnarmorling Postgres MySQL Apache Kafka CDC with Debezium and

    Kafka Connect CDC with Debezium and Kafka Connect
  11. CDC with Debezium and Kafka Connect CDC with Debezium and

    Kafka Connect #Debezium @gunnarmorling Postgres MySQL Apache Kafka Kafka Connect Kafka Connect
  12. #Debezium @gunnarmorling Postgres MySQL Apache Kafka Kafka Connect Kafka Connect

    DBZ PG DBZ MySQL CDC with Debezium and Kafka Connect CDC with Debezium and Kafka Connect
  13. #Debezium @gunnarmorling Postgres MySQL Kafka Connect Kafka Connect Apache Kafka

    DBZ PG DBZ MySQL Elasticsearch ES Connector CDC with Debezium and Kafka Connect CDC with Debezium and Kafka Connect
  14. Pattern: Microservice Data Pattern: Microservice Data Synchronization Synchronization Microservice Architectures

    Microservice Architectures Propagate data between different services without coupling Each service keeps optimised views locally #Debezium @gunnarmorling Order Item Stock App Local DB Local DB Local DB App App Item Changes Stock Changes
  15. Source DB (with "Events" table) Kafka Connect Apache Kafka DBZ

    Order Events Credit Worthiness Check Events Pattern: Outbox Pattern: Outbox Avoiding Dual Writes Avoiding Dual Writes #Debezium @gunnarmorling ID Category Type Payload 123 Order OrderCreated { "id" : 123, ... } 456 Order OrderDetailCanceled { "id" : 456, ... } Order Service Shipment Service Customer Service ID Category Type Payload 123 Order OrderCreated { "id" : 123, ... } 456 Order OrderDetail- Canceled { "id" : 456, ... } 789 ... ... ...
  16. Pattern: Microservice Extraction Pattern: Microservice Extraction Migrating from Monoliths to

    Microservices Migrating from Monoliths to Microservices Extract microservice for single component(s) Keep write requests against running monolith Stream changes to extracted microservice Test new functionality Switch over, evolve schema only afterwards #Debezium @gunnarmorling
  17. Pattern: Ensuring Data Quality Pattern: Ensuring Data Quality Detecting Missing

    or Wrong Data Detecting Missing or Wrong Data Constantly compare record counts on source and sink side Raise alert if threshold is reached Compare every n-th record field by field E.g. have all records compared within one week #Debezium @gunnarmorling
  18. Pattern: Leverage the Powers of SMTs Pattern: Leverage the Powers

    of SMTs Single Message Transformations Single Message Transformations Aggregate sharded tables to single topic Keep compatibility with existing consumers Format conversions, e.g. for dates Ensure compatibility with sink connectors Extracting "after" state only Expand MongoDB's JSON structures #Debezium @gunnarmorling
  19. Running Debezium on Kubernetes Running Debezium on Kubernetes AMQ Streams:

    Enterprise Distribution of Apache Kafka AMQ Streams: Enterprise Distribution of Apache Kafka Provides Container images for Apache Kafka, Connect, Zookeeper and MirrorMaker Operators for managing/configuring Apache Kafka clusters, topics and users Kafka Consumer, Producer and Admin clients, Kafka Streams Supported by Red Hat Upstream Community: Strimzi #Debezium @gunnarmorling
  20. Summary Summary CDC enables use cases such as replication, microservices

    data exchange and much more Debezium: CDC for a growing number of databases Contributions welcome! Tell us about your feature requests and ideas! #Debezium @gunnarmorling
  21. Resources Resources Website: Source code, examples, Compose files etc. Discussion

    group Strimzi (Kafka on Kubernetes/OpenShift) Latest news: @debezium http://debezium.io/ https://github.com/debezium https://groups.google.com/forum/ #!forum/debezium http://strimzi.io/ #Debezium @gunnarmorling