[COC_ASIA_2025] When Flink Meets Fluss: The Future of Streaming Warehouse

Jark Wu PMC member of Apache Flink PPMC member of
Apache Fluss (incubating) When Apache Flink Meets Apache Fluss(Incubating): The Future of Streaming Warehouse

Data Infra History 2008 2010 2017 1st batch storage defacto
standard batch engine 2nd batch storage 2011 2014 2024 1st streaming storage defacto standard streaming engine 2nd streaming storage Batch Stream

Kafka is not designed for streaming analytics, Flink needs its
“Iceberg” moment!

Apache Fluss (incubating): Streaming Storage for Real-Time Analytics Union Reads
Streaming Writes Real-Time Updates Server Server Server Streaming Reads Batch Reads Fluss Cluster Remote Storage ( S3 / OSS / HDFS ) Lakehouse Storage ( Paimon / Iceberg* / Lance*) Tiering Service Lookup Join Lakehouse Analytics Streaming Read & Write Real-Time Updates & Lookups Column & Partition Pruning Uniﬁed Stream & Lake Databases Logs Images* Videos*

History of Apache Fluss (incubating) 2023/07 2013/12 2025/06 2024/11 Open
Sourced at FFA 2024 First Introduction at FFA 2023 Joins ASF Incubator Initiated by Flink Team at Alibaba

Why Kafka is not a Good Storage for Flink

1. Integration Overview of Kafka vs. Fluss Flink Lookup Join
Dimension Table Source Table Sink Table Catalog Real-Time Write Real-Time Update Partial Update CDC YAML Streaming Read Batch Read Union Read Full-Increment Read Binlog Read Metadata Read Metadata Update Source Table Sink Table Catalog Flink Lookup Real-Time Write Streaming Read Metadata Dimension Table Fluss: 360o Integration with Flink Kafka: Limited Integration with Flink

2. No Schema In Kafka vs. First-Class Schema in Fluss
Kafka: manually mapping schema Fluss: auto-discover metadata via Catalog

3. Lookup Joins in Kafka vs. Fluss Lookup Join Copy
Lookup Join With Fluss • Millions Lookup QPS • Real-Time Updated • Preﬁx Lookup ( 1 : N ) With Kafka • Rely External KV for Lookup Join • Cost & Complex Pipeline

4. Kafka is Not Queryable vs. Instant Query on Fluss
Kafka Kafka Kafka dump dump dump Dump Kafka to OLAP system for query & debug Direct Query Fluss Tables in Real-Time

5. Kafka Topic is not Shareable – 中间层数据不可复用 Aggregation Sink
Join Sink TopN Sink 100G 100G 100G Flink Job 1 Flink Job 2 Flink Job 3 Kafka Changelog Normalize Changelog Normalize Changelog Normalize 100G Kafka doesn’t support Update and Changelog Feed, Kafka Topic is hard to be reused by downstream jobs, Each job has to DEDUPLICATE Kafka topic in Flink State. Aggregation Sink Join Sink TopN Sink Flink Job 1 Flink Job 2 Flink Job 3 Fluss 100G CDC CDC CDC Fluss tables is shareable in data warehouse.

6. No I/O Pruning in Kafka vs. Column Pruning In
Fluss Flink Col1 Col2 Col3 Col4 Kafka Row-Based Read Col1 Col2 Col3 Col4 Flink Fluss Column-Based Read Fluss can achieve 10x read perf than Kafka if only 10% columns are needed. IPC Streaming Format Based on Based on Avro/JSON/CSV format

7. Kafka is Not Unified with Lakehouse – 非流批一体架构 Shared
Data Real-Time Data Layer (Short-Term, Second Latency) Historical Data Layer (Long-Term, Minute Latency) Lakehouse Storage Fluss Storage Lakehouse Storage ( Paimon, Iceberg* ) Bronze Kafka Kafka Kafka Iceberg Iceberg Silver Gold ETL ETL Data Copy Data Copy Iceberg Data Copy Duplicated Data Copies In Each Layer Data Shared Between Fluss and Lakehouse Reduce 50%+ Streaming Storage Cost

Production & Use Cases Of Apache Fluss (incubating)

Apache Fluss (Incubating) Production in Alibaba Total Data Size Ingest
Throughput Highest Lookup QPS on a Single Table Rows of Largest Table 40 GB/s 500 B 500 K 3 PB

Use Case 1 : Log Collection and Real-Time Analysis The
Challenge Storage Cost Network Cost • Data keeps growing • Retain longer than 3 days • 1 write, 10 reads • Expensive cross-AZ traﬃc The Solution Shared Data Column Pruning • Reduce 30% data in Fluss • Keep long term in Lakehouse • Columnar log format (Arrow) • Partition Pruning and more Total Cost 30% Read Traﬃc 70% Use Kafka Use Fluss

Use Case 2 : Delta Join for Large Scale Streaming
Join Streaming Join Large State page order ① Streaming Read Delta Join page order conversion conversion Alibaba Donated Delta Join to Flink v2.1: FLIP-486 Eliminates 100TB State, Checkpoint from 90s to 1s Fast Flink Job updates without state bootstrap 85% lower Flink CPU & Mem resource usage Inspect Join data in real-time via Fluss Query API

Future Roadmap Storage for Flink Defacto standard storage of Flink
Streaming Lakehouse More formats, more engines Multi-modal AI Streaming into Lance

Fluss Community Contributor Slack Workspace https://www.linkedin.com/company/apachefluss/ Monthly Community Call 社区钉钉群
https://github.com/apache/fluss 微信公众号

Jark Wu jarkwu@LinkedIn Thanks

[COC_ASIA_2025] When Flink Meets Fluss: The Fut...

[COC_ASIA_2025] When Flink Meets Fluss: The Future of Streaming Warehouse

Jark Wu

More Decks by Jark Wu

Other Decks in Technology

Featured

Transcript

Jark Wu PMC member of Apache Flink PPMC member of

Data Infra History 2008 2010 2017 1st batch storage defacto

Kafka is not designed for streaming analytics, Flink needs its

Apache Fluss (incubating): Streaming Storage for Real-Time Analytics Union Reads

History of Apache Fluss (incubating) 2023/07 2013/12 2025/06 2024/11 Open

Why Kafka is not a Good Storage for Flink

1. Integration Overview of Kafka vs. Fluss Flink Lookup Join

2. No Schema In Kafka vs. First-Class Schema in Fluss

3. Lookup Joins in Kafka vs. Fluss Lookup Join Copy

4. Kafka is Not Queryable vs. Instant Query on Fluss

5. Kafka Topic is not Shareable – 中间层数据不可复用 Aggregation Sink

6. No I/O Pruning in Kafka vs. Column Pruning In

7. Kafka is Not Unified with Lakehouse – 非流批一体架构 Shared

Production & Use Cases Of Apache Fluss (incubating)

Apache Fluss (Incubating) Production in Alibaba Total Data Size Ingest

Use Case 1 : Log Collection and Real-Time Analysis The

Use Case 2 : Delta Join for Large Scale Streaming

Future Roadmap Storage for Flink Defacto standard storage of Flink

Fluss Community Contributor Slack Workspace https://www.linkedin.com/company/apachefluss/ Monthly Community Call 社区钉钉群

Jark Wu jarkwu@LinkedIn Thanks