Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Change Data Capture: The Magic Wand We Forgot

Change Data Capture: The Magic Wand We Forgot

Talk given at Berlin Buzzwords, Berlin, Germany on 2 June 2015. http://martin.kleppmann.com/2015/06/02/change-capture-at-berlin-buzzwords.html

A simple application may start out with one database, but as you scale and add features, it usually turns into a tangled mess of datastores, replicas, caches, search indexes, analytics systems and message queues. When new data is written, how do you make sure it ends up in all the right places? If something goes wrong, how do you recover?

Change Data Capture (CDC) is an old idea: let the application subscribe to a stream of everything that is written to a database – a feed of data changes. You can use that feed to update search indexes, invalidate caches, create snapshots, generate recommendations, copy data into another database, and so on. For example, LinkedIn’s Databus and Facebook’s Wormhole do this. But the idea is not as widely known as it should be.

In this talk, I will explain why change data capture is so useful, and how it prevents race conditions and other ugly problems. Then I’ll go into the practical details of implementing CDC with PostgreSQL and Apache Kafka, and discuss the approaches you can use to do the same with various other databases.

A new era of sanity in data systems awaits!

Martin Kleppmann

June 02, 2015
Tweet

More Decks by Martin Kleppmann

Other Decks in Programming

Transcript

  1. Further reading 1.  Martin Kleppmann: “Bottled Water: Real-time integration of

    PostgreSQL and Kafka.” 23 April 2015. http://blog.confluent.io/2015/04/23/bottled-water-real-time-integration-of-postgresql-and-kafka/ 2.  Shirshanka Das, Chavdar Botev, Kapil Surlaker, et al.: “All Aboard the Databus!,” at ACM Symposium on Cloud Computing (SoCC), October 2012. http://www.socc2012.org/s18-das.pdf 3.  Yogeshwer Sharma, Philippe Ajoux, Petchean Ang, et al.: “Wormhole: Reliable Pub-Sub to Support Geo- replicated Internet Services,” at 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI), May 2015. https://www.usenix.org/system/files/conference/nsdi15/nsdi15-paper- sharma.pdf 4.  Jay Kreps: “I ♥︎ Logs.” O'Reilly Media, September 2014. http://shop.oreilly.com/product/ 0636920034339.do 5.  Martin Kleppmann: “Designing data-intensive applications.” O’Reilly Media, to appear in 2015. http:// dataintensive.net 6.  Martin Kleppmann: “Turning the database inside-out with Apache Samza.” 4 March 2015. http:// blog.confluent.io/2015/03/04/turning-the-database-inside-out-with-apache-samza/ 7.  Pat Helland: “Immutability Changes Everything,” at 7th Biennial Conference on Innovative Data Systems Research (CIDR), January 2015. http://www.cidrdb.org/cidr2015/Papers/CIDR15_Paper16.pdf