easy, we’ve embedded a bunch of examples right here. Copy some of these requests into your terminal and check out what happens. With wrappers in Ruby, PHP, Python and more, you can get started in minutes. Learn More ➤
How it Started, How it Ended 3 Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Capture Change Data Capture (CDC) is widely-used at Stripe to capture data changes from databases without critically impacting database reliability and scalability. CDC powers many critical financial use cases at Stripe such as the Stripe Dashboard, Stripe Search, Sigma, and Financial Reporting. From idea to production—things may seem straightforward at first, but the details matter. We detail our journey of how we leveraged Flink for Change Data Capture at Stripe in order to uphold the highest data quality standards. Freshness, Coverage, and Correctness SLOs are paramount to the success of platforms and applications running on top of our CDC infrastructure. Change Event Streams are ubiquitous across Stripe given the vast number of applications and employees generating datasets worldwide. Change Event Streams are independent from one another which leads to the typical challenges in distributed systems. One of the major use cases revolves around aggregating individual change events of a database transaction to support Stripe’s payments infrastructure.
data changes from databases without critically impacting database reliability and scalability. CDC powers many critical financial use cases at Stripe such as the Stripe Dashboard, Stripe Search, Sigma, and Financial Reporting. 8 From idea to production—things may seem straightforward at first, but the details matter. We detail our journey of how we leveraged Flink for Change Data Capture at Stripe in order to uphold the highest data quality standards. Freshness, Coverage, and Correctness SLOs are paramount to the success of platforms and applications running on top of our CDC infrastructure. Change Event Streams are ubiquitous across Stripe given the vast number of applications and employees generating datasets worldwide. Change Event Streams are independent from one another which leads to the typical challenges in distributed systems. One of the major use cases revolves around aggregating individual change events of a database transaction to support Stripe’s payments infrastructure. Agenda CDC at Stripe 1 Aggregating Change Events 2 How it Started, How it Ended 3 Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Capture
Make sure that we abstract away database internals such as sharding topology and ensure a datastore-agnostic transport. Build a high leveraged platform which makes working with Change Events interoperable with other systems within the organization. Minimal toil given as we scale the number of datasets, ensure clean separation between infrastructure and user issues, create great operator experiences, reduce control plane and data plane blast radius, maintain good operator tooling/developer experience/processes. CDC at Stripe
How it Started, How it Ended 3 Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Capture Change Data Capture (CDC) is widely-used at Stripe to capture data changes from databases without critically impacting database reliability and scalability. CDC powers many critical financial use cases at Stripe such as the Stripe Dashboard, Stripe Search, Sigma, and Financial Reporting. From idea to production—things may seem straightforward at first, but the details matter. We detail our journey of how we leveraged Flink for Change Data Capture at Stripe in order to uphold the highest data quality standards. Freshness, Coverage, and Correctness SLOs are paramount to the success of platforms and applications running on top of our CDC infrastructure. Change Event Streams are ubiquitous across Stripe given the vast number of applications and employees generating datasets worldwide. Change Event Streams are independent from one another which leads to the typical challenges in distributed systems. One of the major use cases revolves around aggregating individual change events of a database transaction to support Stripe’s payments infrastructure.
data use transactions Arbitrary number of tables in a database transaction They should be able to get transactions back out from the CDC path They shouldn’t have to become stream processing experts
stream. • Requires streams of the same type Union 38 38 time Change Events Transaction Metadata Events Event 1 Event 2 BEGIN COMMIT BEGIN COMMIT Event 3 (No output; won’t compile because streams are of different types) Aggregating Change Events
left or right stream. • Out-of-box • No concept of keys Either.left = Either.right = null Either 44 time Change Events Transaction Metadata Events Event 1 Event 2 BEGIN COMMIT BEGIN COMMIT Event 3 Event 1 BEGIN , Either.left = null Either.right = , … Aggregating Change Events
WrappedEvent.key = txn-1 WrappedEvent.left = WrappedEvent.right = null time Change Events Transaction Metadata Events Event 1 Event 2 BEGIN COMMIT BEGIN COMMIT Event 3 Event 1 BEGIN , , … Wraps an event containing one of two types, either from left or right stream, and a common key among both events. • Small and simple code addition • Need to extract keys Aggregating Change Events
and name omitted. .connect(changeEventStream) // Union different types. .flatMap(new WrappedEventFunction) // Like Either type, but with extra fields. .keyBy(_.key) // Group events with the same transaction ID. Aggregating Change Events
Metadata Event Stream Change Events must have the same transaction IDs Handle late arriving or duplicate Change Events and Transaction Metadata Events Don’t result in infinite state growth 49 Aggregating Change Events
1 Event 2 BEGIN COMMIT BEGIN COMMIT Event 3 Assigns elements to windows of a fixed size, but with a slide interval. • Almost like a tumbling window, but with windows overlapping Aggregating Change Events
1 Event 2 BEGIN COMMIT • Late-arriving events? Same as tumbling windows. • Slide interval? Explosion of windows • Not quite right… Aggregating Change Events Assigns elements to windows of a fixed size, but with a slide interval. • Almost like a tumbling window, but with windows overlapping
1 BEGIN COMMIT Event 2 BEGIN COMMIT Event 3 Aggregating Change Events Assigns elements that are seen relatively close to each other. • Arbitrarily-sized windows; no fixed start and end • Windows don’t overlap • Windows close based on a defined gap of inactivity
1 BEGIN COMMIT Event 2 Assigns elements that are seen relatively close to each other. • Arbitrarily-sized windows; no fixed start and end • Windows don’t overlap • Windows close based on a defined gap of inactivity Aggregating Change Events
1 BEGIN COMMIT Event 2 Assigns elements that are seen relatively close to each other. • Arbitrarily-sized windows; no fixed start and end • Windows don’t overlap • Windows close based on a defined gap of inactivity Aggregating Change Events
1 BEGIN COMMIT Event 2 • Session gap too small? Incomplete aggregates Assigns elements that are seen relatively close to each other. • Arbitrarily-sized windows; no fixed start and end • Windows don’t overlap • Windows close based on a defined gap of inactivity Aggregating Change Events
1 BEGIN COMMIT Event 2 • Session gap too small? Incomplete aggregates Assigns elements that are seen relatively close to each other. • Arbitrarily-sized windows; no fixed start and end • Windows don’t overlap • Windows close based on a defined gap of inactivity Aggregating Change Events
1 BEGIN COMMIT Event 2 • Session gap too small? Incomplete aggregates Assigns elements that are seen relatively close to each other. • Arbitrarily-sized windows; no fixed start and end • Windows don’t overlap • Windows close based on a defined gap of inactivity Aggregating Change Events
1 BEGIN COMMIT Event 2 • Session gap too small? Incomplete aggregates • Session gap too big? Trade-off: Freshness vs Correctness Assigns elements that are seen relatively close to each other. • Arbitrarily-sized windows; no fixed start and end • Windows don’t overlap • Windows close based on a defined gap of inactivity Aggregating Change Events
1 BEGIN COMMIT Event 2 • Session gap too small? Incomplete aggregates • Session gap too big? Trade-off: Freshness vs Correctness • Not quite right… Assigns elements that are seen relatively close to each other. • Arbitrarily-sized windows; no fixed start and end • Windows don’t overlap • Windows close based on a defined gap of inactivity Aggregating Change Events
Only a single window per key • Window never closes time Change Events Transaction Metadata Events Event 1 BEGIN COMMIT Event 2 BEGIN COMMIT Event 3 Aggregating Change Events
Only a single window per key • Window never closes time Change Events Transaction Metadata Events Event 1 BEGIN COMMIT Event 2 BEGIN COMMIT Event 3 • Outputs never get evaluated and materialized • Needs more… Aggregating Change Events
a Global Window and add a custom stateful trigger. • Flexibly define open/close conditions for non-overlapping windows • Reasonably handle late-arriving events • Avoid infinite state growth and reduce likelihood of incomplete aggregates Aggregating Change Events
begin transaction marker: update begin marker state else: update commit marker state update bitmap state using commit marker’s total event count set timeout state and register event time timer else: update bitmap state with change event’s global position set timeout state and register event time timer if should trigger(begin, commit, total events): clear window TriggerResult.FIRE_AND_PURGE else: TriggerResult.CONTINUE Reference Aggregating Change Events // ChangeEvent#transaction { "id" : "transaction-id", "global_position": 1, "source_position": 1, } // TransactionMetadataEvent { "id" : "transaction-id", "ts_utc": 1659375300000, "marker": "COMMIT", "total_events": 3, "per_source_event_counts": [{ ... }], }
// Union different types. .flatMap(new WrappedEventFunction) // Like Either type, but with extra fields. .keyBy(_.key) // Group events with the same transaction ID. Flink Job Definition 72 .window(GlobalWindows.create) .trigger(new TransactionBoundaryTrigger(...)) // Flexible windowing semantics. .process(new KeyedProcessor(...)) Aggregating Change Events
// Union different types. .flatMap(new WrappedEventFunction) // Like Either type, but with extra fields. .keyBy(_.key) // Group events with the same transaction ID. .window(GlobalWindows.create) .trigger(new TransactionBoundaryTrigger(...)) // Flexible windowing semantics. .process(new KeyedProcessor(...)) Flink Job Definition 74 mainStream // Side output to DLQ. .getSideOutput(...) .addSink(...) mainStream // Output aggregated change events. .addSink(...) Aggregating Change Events
How it Started, How it Ended 3 Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Capture Change Data Capture (CDC) is widely-used at Stripe to capture data changes from databases without critically impacting database reliability and scalability. CDC powers many critical financial use cases at Stripe such as the Stripe Dashboard, Stripe Search, Sigma, and Financial Reporting. From idea to production—things may seem straightforward at first, but the details matter. We detail our journey of how we leveraged Flink for Change Data Capture at Stripe in order to uphold the highest data quality standards. Freshness, Coverage, and Correctness SLOs are paramount to the success of platforms and applications running on top of our CDC infrastructure. Change Event Streams are ubiquitous across Stripe given the vast number of applications and employees generating datasets worldwide. Change Event Streams are independent from one another which leads to the typical challenges in distributed systems. One of the major use cases revolves around aggregating individual change events of a database transaction to support Stripe’s payments infrastructure.
getting mixed up Reduce parallelism on Source Sub Tasks for all streams Make sure parallelism ≤ ∑ Topic Partitions Generally, check with SplitEnumerator classes How it Started, How it Ended
use event time More precise Not perfect; can still result in incomplete aggregates in edge cases That’s the reality of streaming How it Started, How it Ended
Subscriber Subscribe to all topics (for a keyspace) by default Control plane (external) service produces an event to Broadcast Stream On broadcast element, use Broadcast State to keep onboarded datasets in state On element, check Broadcast State and filter for onboarded datasets How it Started, How it Ended
store or bloom filter Move incomplete aggregate measurement out of the Flink Job and into a system downstream Fix 98 How it Started, How it Ended New system needs to dedupe events… for all time?
How it Started, How it Ended 3 Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Capture Change Data Capture (CDC) is widely-used at Stripe to capture data changes from databases without critically impacting database reliability and scalability. CDC powers many critical financial use cases at Stripe such as the Stripe Dashboard, Stripe Search, Sigma, and Financial Reporting. From idea to production – things may seem straightforward at first, but the details matter. We detail our journey of how we leveraged Flink for Change Data Capture at Stripe in order to uphold the highest data quality standards. Freshness, Coverage, and Correctness SLOs are paramount to the success of platforms and applications running on top of our CDC infrastructure. Change Event Streams are ubiquitous across Stripe given the vast number of applications and employees generating datasets worldwide. Change Event Streams are independent from one another which leads to the typical challenges in distributed systems. One of the major use cases revolves around aggregating individual change events of a database transaction to support Stripe’s payments infrastructure.
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Capture Wrap Up 102 Change Data Capture (CDC) is widely-used at Stripe to improve database reliability and scalability Flink is a critical component in Stripe’s CDC infrastructure that allows us to work with financial streaming data with high data quality guarantees