Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Change Data Streaming Patterns for Microservice...

Change Data Streaming Patterns for Microservices With Debezium (Kafka Summit London)

Debezium (noun | de·be·zi·um | /dɪ:ˈbɪ:ziːəm/) – Secret Sauce for Change Data Capture

Apache Kafka has become the defacto standard for asynchronous event propagation between microservices. Things get challenging though when adding a service’s database to the picture: How can you avoid inconsistencies between Kafka and the database? Enter change data capture (CDC) and Debezium. By capturing changes from the log files of the database, Debezium gives you both reliable and consistent inter-service messaging via Kafka as well as instant read-your-own-write semantics for services themselves.

Join this session to learn how to leverage CDC for reliable microservices integration and solving typical challenges such as gradually extracting microservices from existing monoliths, maintaining different read models in CQRS-style architectures, and updating caches as well as full-text indexes. You’ll find out how Debezium streams all the changes from datastores such as MySQL, PostgreSQL, SQL Server and MongoDB into Kafka, and how Debezium is designed to not compromise on data correctness and completeness also if things go wrong. In a live demo we’ll show how to use Debezium to set up a change data stream out of your application’s database, without any code changes needed. You’ll see how to consume change events in other services, how to gain real-time insight into your changing data using Kafka Streams and much more.

Gunnar Morling

May 14, 2019
Tweet

More Decks by Gunnar Morling

Other Decks in Programming

Transcript

  1. Change Data Streaming Patterns for Change Data Streaming Patterns for

    Microservices With Debezium Microservices With Debezium Gunnar Morling Gunnar Morling Software Engineer
  2. Gunnar Morling Gunnar Morling Open source software engineer at Red

    Hat Debezium Hibernate Spec Lead for Bean Validation 2.0 Other projects: Deptective, MapStruct Java Champion [email protected] @gunnarmorling http://in.relation.to/gunnar-morling/ #KafkaSummit #Debezium @gunnarmorling
  3. A Common Problem A Common Problem Updating Multiple Resources Updating

    Multiple Resources @gunnarmorling Database Order Service #KafkaSummit #Debezium
  4. A Common Problem A Common Problem Updating Multiple Resources Updating

    Multiple Resources @gunnarmorling Cache Database Order Service #KafkaSummit #Debezium
  5. A Common Problem A Common Problem Updating Multiple Resources Updating

    Multiple Resources @gunnarmorling Cache Database Order Service Search Index #KafkaSummit #Debezium
  6. A Common Problem A Common Problem Updating Multiple Resources Updating

    Multiple Resources @gunnarmorling Order Service Cache Database Search Index “ Friends Don't Let Friends Do Dual Writes #KafkaSummit #Debezium
  7. A Better Solution A Better Solution Streaming Change Events From

    the Database Streaming Change Events From the Database @gunnarmorling Order Service #KafkaSummit #Debezium
  8. A Better Solution A Better Solution Streaming Change Events From

    the Database Streaming Change Events From the Database @gunnarmorling Order Service C C U C U U D C C - Create U - Update D - Delete Change Data Capture #KafkaSummit #Debezium
  9. A Better Solution A Better Solution Streaming Change Events From

    the Database Streaming Change Events From the Database @gunnarmorling Order Service C C U C U U D C C - Create U - Update D - Delete Change Data Capture #KafkaSummit #Debezium
  10. Debezium Debezium Change Data Capture Platform Change Data Capture Platform

    Retrieves change events from TX logs from different DBs Transparent to writing apps Comprehensive type support (PostGIS etc.) Snapshotting, Filtering etc. Fully open-source, very active community Latest version: 0.9 (based on Kafka 2.2) Production deployments at multiple companies (e.g. WePay, Trivago, BlaBlaCar etc.) @gunnarmorling #KafkaSummit #Debezium
  11. Advantages of Log-based CDC Advantages of Log-based CDC Tailing the

    transaction log Tailing the transaction log All data changes are captured No polling delay or overhead Transparent to writing applications and models Can capture deletes Can capture old record state and further meta data Different formats/APIs, but Debezium deals with this @gunnarmorling #KafkaSummit #Debezium
  12. Debezium Debezium CDC Use Cases CDC Use Cases Update or

    invalidate caches Enable full-text search via Elasticsearch, Solr etc. Data replication Microservices data exchange Auditing/historization Update CQRS read models Enable streaming queries @gunnarmorling #KafkaSummit #Debezium
  13. Change Event Structure Change Event Structure Key: PK of table

    Value: Describing the change event Before state, After state, Source info Serialization formats: JSON Avro { "before": null, "after": { "id": 1004, "first_name": "Anne", "last_name": "Kretchmar", "email": "[email protected]" }, "source": { "name": "dbserver1", "server_id": 0, "ts_sec": 0, "file": "mysql­bin.000003", "pos": 154, "row": 0, "snapshot": true, "db": "inventory", "table": "customers" }, "op": "c", "ts_ms": 1486500577691 } @gunnarmorling #KafkaSummit #Debezium
  14. Debezium Connectors Debezium Connectors MySQL Postgres MongoDB SQL Server Oracle

    (Tech Preview, based on XStream) Possible future additions Cassandra? MariaDB? @gunnarmorling #KafkaSummit #Debezium
  15. @gunnarmorling Postgres MySQL Apache Kafka CDC with Debezium and Kafka

    Connect CDC with Debezium and Kafka Connect #KafkaSummit #Debezium
  16. CDC with Debezium and Kafka Connect CDC with Debezium and

    Kafka Connect @gunnarmorling Postgres MySQL Apache Kafka Kafka Connect Kafka Connect #KafkaSummit #Debezium
  17. @gunnarmorling Postgres MySQL Apache Kafka Kafka Connect Kafka Connect DBZ

    PG DBZ MySQL CDC with Debezium and Kafka Connect CDC with Debezium and Kafka Connect #KafkaSummit #Debezium
  18. @gunnarmorling Postgres MySQL Kafka Connect Kafka Connect Apache Kafka DBZ

    PG DBZ MySQL Elasticsearch ES Connector CDC with Debezium and Kafka Connect CDC with Debezium and Kafka Connect #KafkaSummit #Debezium
  19. @gunnarmorling Order Item Stock App Local DB Local DB Local

    DB App App Item Changes Stock Changes Pattern: Microservice Data Pattern: Microservice Data Synchronization Synchronization Microservice Architectures Microservice Architectures Propagate data between different services without coupling Each service keeps optimised views locally #KafkaSummit #Debezium
  20. Source DB (with "Events" table) Kafka Connect Apache Kafka DBZ

    Order Events Credit Worthiness Check Events Pattern: Outbox Pattern: Outbox Separate Events Table Separate Events Table @gunnarmorling Order Service Shipment Service Customer Service ID Category Type Payload 123 Order OrderCreated { "id" : 123, ... } 456 Order OrderDetail- Canceled { "id" : 456, ... } 789 ... ... ... Id AggregateType AggregateId Type Payload ec6e Order 123 OrderCreated { "id" : 123, ... } 8af8 Order 456 OrderDetailCanceled { "id" : 456, ... } 890b Customer 789 InvoiceCreated { "id" : 789, ... } "Outbox" table #KafkaSummit #Debezium
  21. Pattern: Microservice Extraction Pattern: Microservice Extraction Migrating from Monoliths to

    Microservices Migrating from Monoliths to Microservices Extract microservice for single component(s) Keep write requests against running monolith Stream changes to extracted microservice Test new functionality Switch over, evolve schema only afterwards @gunnarmorling #KafkaSummit #Debezium
  22. Pattern: Leverage the Powers of SMTs Pattern: Leverage the Powers

    of SMTs Single Message Transformations Single Message Transformations Aggregate sharded tables to single topic Keep compatibility with existing consumers Format conversions, e.g. for dates Ensure compatibility with sink connectors Extracting "after" state only Expand MongoDB's JSON structures @gunnarmorling #KafkaSummit #Debezium
  23. Pattern: Ensuring Data Quality Pattern: Ensuring Data Quality Detecting Missing

    or Wrong Data Detecting Missing or Wrong Data Constantly compare record counts on source and sink side Raise alert if threshold is reached Compare every n-th record field by field E.g. have all records compared within one week @gunnarmorling #KafkaSummit #Debezium
  24. Running Debezium on Kubernetes Running Debezium on Kubernetes AMQ Streams:

    Enterprise Distribution of Apache Kafka AMQ Streams: Enterprise Distribution of Apache Kafka Provides Container images for Apache Kafka, Connect, Zookeeper and MirrorMaker Operators for managing/configuring Apache Kafka clusters, topics and users Kafka Consumer, Producer and Admin clients, Kafka Streams Supported by Red Hat Upstream Community: Strimzi @gunnarmorling #KafkaSummit #Debezium
  25. Support for Debezium Support for Debezium Red Hat Integration Red

    Hat Integration Debezium is being productized as part of the Red Hat Integration product Initially Microsoft SQL Server, MySQL, PostgreSQL, and MongoDB connectors Integrated with AMQ Streams Developer Preview to be released soon; general availability (GA) planned for later this year @gunnarmorling #KafkaSummit #Debezium
  26. Summary Summary CDC enables use cases such as replication, microservices

    data exchange and much more Debezium: CDC for a growing number of databases Contributions welcome! Tell us about your feature requests and ideas! @gunnarmorling “ Friends Don't Let Friends Do Dual Writes #KafkaSummit #Debezium
  27. Resources Resources Website: Source code, examples, Compose files etc. Discussion

    group Strimzi (Kafka on Kubernetes/OpenShift) Latest news: @debezium http://debezium.io/ https://github.com/debezium https://groups.google.com/forum/ #!forum/debezium http://strimzi.io/ @gunnarmorling #KafkaSummit #Debezium