Incremental Change Processing with Apache Flink and Iceberg

Incremental Change Processing with Apache Flink and Iceberg Sharon Xie
Founding Engineer & Head of Product @Decodable 2025-07-23

Today we are going to learn

Change Data Capture (CDC) -> Change Stream The observer for
your database.

Incremental Processing with Iceberg & Flink

How Does Flink Process Change Stream?

Processing Change Stream w/ Flink

Processing Change Stream w/ Flink (cont)

Processing Change Stream w/ Flink (cont) Flink has 4 record
types: insert, update_before, update_after, delete

Processing Change Stream w/ Flink (cont)

Challenges with Iceberg V2

What is needed? • Flink can read the changes to
a table as a stream • Flink can write change streams to a table

❌ Iceberg CDC Streaming Reads

⚠ CDC Streaming Writes Iceberg has upsert table. But, Flink
produces high volume retractions (deletes and inserts) • Equality deletes ◦ Optimized for write performance ◦ But slow for query time • Lots of delete files ◦ Small files problem ◦ Slow for query time

Change Processing Doesn’t Really Work for Iceberg V2

The Trick • Store change events in append tables •
Merge/Materialize changes in append tables on read

How Does It Work?

Change Stream -> Append Stream

Append Stream -> Append Table for Storing

What’s in the Append Table?

Querying the Iceberg Table Find the latest value per key

What about the query performance? • Records to merge grow
over time • Can we do “compaction” similar to kafka compacted topics?

Streaming Job Writing to the Current Partition

Batch Job Merges Keys of Old Partition

Delete the Merged Partition

Putting it all together

Iceberg V3 - Row Lineage

Row Lineage • Tracks changes to individual rows as they
are updated, deleted, or inserted. ◦ Foundation for incremental change processing • Not tracked for rows updated via equality deletes

What does this mean for Flink? • Writer side ◦
Flink uses equality deletes ◦ Can’t track row lineage information • Reader side ◦ Needs to derive a stream of changes from row lineage information ◦ No development yet • TL,DR: framework is there, but solution is not ready

🎁 Wrapping Up

1. Incremental change processing with Iceberg V2 requires workarounds: a.
Write change streams to append-only tables b. Read append tables as change streams c. Schedule batch merges to maintain performance 2. Iceberg V3’s row lineage can make change processing easier a. But still needs more development in processing engines Key Takeaways

Thank You 🙏 Q&A Sharon Xie

Incremental Change Processing with Apache Flink...

Incremental Change Processing with Apache Flink and Iceberg

Sharon Xie

More Decks by Sharon Xie

Featured

Transcript