Change Data Capture: The Magic Wand We Forgot

June 02, 2015

16k

Change Data Capture: The Magic Wand We Forgot

Talk given at Berlin Buzzwords, Berlin, Germany on 2 June 2015. http://martin.kleppmann.com/2015/06/02/change-capture-at-berlin-buzzwords.html

A simple application may start out with one database, but as you scale and add features, it usually turns into a tangled mess of datastores, replicas, caches, search indexes, analytics systems and message queues. When new data is written, how do you make sure it ends up in all the right places? If something goes wrong, how do you recover?

Change Data Capture (CDC) is an old idea: let the application subscribe to a stream of everything that is written to a database – a feed of data changes. You can use that feed to update search indexes, invalidate caches, create snapshots, generate recommendations, copy data into another database, and so on. For example, LinkedIn’s Databus and Facebook’s Wormhole do this. But the idea is not as widely known as it should be.

In this talk, I will explain why change data capture is so useful, and how it prevents race conditions and other ugly problems. Then I’ll go into the practical details of implementing CDC with PostgreSQL and Apache Kafka, and discuss the approaches you can use to do the same with various other databases.

A new era of sanity in data systems awaits!

Martin Kleppmann

June 02, 2015

More Decks by Martin Kleppmann

See All by Martin Kleppmann

Collaborative text editing with Eg-walker: Better, faster, smaller

ept

490

Byzantine Eventual Consistency and Local-first Access Control

ept

740

The past, present, and future of local-first

ept

2.3k

Where local-first came from and where it's going

ept

4.5k

Byzantine fault tolerance for peer-to-peer collaboration

ept

1.3k

New algorithms for collaborative text editing

ept

1.3k

Creating local-first collaboration software with Automerge

ept

2.8k

Collaborative editing through a databases lens

ept

2.5k

Making CRDTs Byzantine fault tolerant

ept

Other Decks in Programming

See All in Programming

The Modern View Layer Rails Deserves: A Vision For 2025 And Beyond @ RailsConf 2025, Philadelphia, PA

marcoroth

820

CLI ツールを Go ライブラリとして再実装する理由 / Why reimplement a CLI tool as a Go library

ktr_0731

650

Google I/O Extended Incheon 2025 ~ What's new in Android development tools

pluu

200

QA x AIエコシステム段階構築作戦

osu

210

AI時代の『改訂新版良いコード／悪いコードで学ぶ設計入門』 / ai-good-code-bad-code

minodriven

10k

AWS Summit Japan 2024と2025の比較／はじめてのKiro、今あなたは岐路に立つ

satoshi256kbyte

250

Quality Gates in the Age of Agentic Coding

helmedeiros

PRO

110

Reactの歴史を振り返る

tutinoko

140

知って得する@cloudflare_vite-pluginのあれこれ

chimame

120

オンコール⼊⾨〜ページャーが鳴る前に、あなたが備えられること〜 / Before The Pager Rings

yktakaha4

1.1k

MCPで実現できる、Webサービス利用体験について

syumai

2.2k

脱Riverpod？fqueryで考える、TanStack Queryライクなアーキテクチャの可能性

ostk0069

570

Featured

See All Featured

How To Stay Up To Date on Web Technology

chriscoyier

790

250k

What's in a price? How to price your products and services

michaelherold

246

12k

Fireside Chat

paigeccino

3.5k

A Modern Web Designer's Workflow

chriscoyier

695

190k

Product Roadmaps are Hard

iamctodd

PRO

11k

Design and Strategy: How to Deal with People Who Don’t "Get" Design

morganepeng

130

19k

Building Flexible Design Systems

yeseniaperezcruz

328

39k

Keith and Marios Guide to Fast Websites

keithpitt

411

22k

"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)

danielanewman

229

22k

Become a Pro

speakerdeck

PRO

5.4k

Build your cross-platform service in a week with App Engine

jlugia

231

18k

Thoughts on Productivity

jonyablonski

4.8k

Transcript

None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
Further reading 1.  Martin Kleppmann: “Bottled Water: Real-time integration of
PostgreSQL and Kafka.” 23 April 2015. http://blog.confluent.io/2015/04/23/bottled-water-real-time-integration-of-postgresql-and-kafka/ 2.  Shirshanka Das, Chavdar Botev, Kapil Surlaker, et al.: “All Aboard the Databus!,” at ACM Symposium on Cloud Computing (SoCC), October 2012. http://www.socc2012.org/s18-das.pdf 3.  Yogeshwer Sharma, Philippe Ajoux, Petchean Ang, et al.: “Wormhole: Reliable Pub-Sub to Support Geo- replicated Internet Services,” at 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI), May 2015. https://www.usenix.org/system/files/conference/nsdi15/nsdi15-paper- sharma.pdf 4.  Jay Kreps: “I ♥︎ Logs.” O'Reilly Media, September 2014. http://shop.oreilly.com/product/ 0636920034339.do 5.  Martin Kleppmann: “Designing data-intensive applications.” O’Reilly Media, to appear in 2015. http:// dataintensive.net 6.  Martin Kleppmann: “Turning the database inside-out with Apache Samza.” 4 March 2015. http:// blog.confluent.io/2015/03/04/turning-the-database-inside-out-with-apache-samza/ 7.  Pat Helland: “Immutability Changes Everything,” at 7th Biennial Conference on Innovative Data Systems Research (CIDR), January 2015. http://www.cidrdb.org/cidr2015/Papers/CIDR15_Paper16.pdf
None

Change Data Capture: The Magic Wand We Forgot

Change Data Capture: The Magic Wand We Forgot

Martin Kleppmann

More Decks by Martin Kleppmann

Other Decks in Programming

Featured

Transcript

Further reading 1. Martin Kleppmann: “Bottled Water: Real-time integration of

Further reading 1.  Martin Kleppmann: “Bottled Water: Real-time integration of