Lock in $30 Savings on PRO—Offer Ends Soon! ⏳

Exploring Modern Table Formats in Rust

Exploring Modern Table Formats in Rust

Avatar for Open Data Circle

Open Data Circle

November 17, 2025
Tweet

More Decks by Open Data Circle

Other Decks in Technology

Transcript

  1. Exploring Modern Table Formats in Rust The Motivation of InnoTable

    Xiao Zhiyan – Open Data Circle November 17th, 2025
  2. Chongqing, China 8D Magic City Hot Pot The Chinese University

    of Hong Kong Mathematics Information Engineering LY Corporation Software Engineer Streaming Data Pipeline Spark Iceberg Open Data Circle Big Data x Rust AI-native Streamhouse Chongqing Hong Kong Tokyo Xiao Zhiyan @xiaozhiyan
  3. Agenda 9 From AI-ready to AI-native 10 AI-native Streamhouse Table

    Formats Challenges Solution The Road Ahead 7 Composable Design 8 InnoTable Framework 4 Streaming-native 5 AI-ready 6 Lightweight Runtime 1 Overview 3 Key Features 2 Landscape
  4. Table Formats Apache Iceberg Delta Lake Apache Hudi Apache Paimon

    Lance Industry standard Strong metadata Strong ACID ML-friendly Optimized writes Incremental pipelines Complex write path Streaming-native LSM-style storage Low-latency upsert Vector-first Analytical workloads Limited streaming Landscape Spark-centric Emerging ecosystem Rust-powered Lacks ACID
  5. Table Formats Metadata-driven architecture Time travel for reproducibility Schema evolution

    and versioning ACID transactions with snapshot isolation 3 4 3 2 1 Key Features
  6. Problem Most table formats designed for batch workloads Expectation High-throughput,

    low-latency streaming read and write Current Situation Apache Paimon Streaming-native design with LSM tree storage Apache Fluss Streaming storage with Lakehouse tables as tiering service Databricks Realtime Mode One of the talks in the meetup today Challenges Streaming-native
  7. Problem Most table formats designed without vector indexing Expectation Efficient

    vector indexing for AI workloads Current Situation External indexing FAISS, Milvus, Weaviate, etc. Lance AI-native vector indexing Both table format and file format in Rust Challenges AI-ready
  8. Challenges Lightweight Runtime Problem JVM, high resource cost, warm-up latency,

    hard to embed Expectation Lightweight, edge-compatible, native performance Robotics (<AI> smart algorithm + <Big Data> efficient data processing) Current Situation DuckDB / DuckLake (C++) One of the talks in the meetup today DataFusion / Ballista (Rust) Polars (Rust) Lance (Rust) iceberg-rust, delta-rs, hudi-rs, paimon-rust (some still WIP)
  9. To create a new monolithic table format (each time) Cost

    inefficient Usually incompatible with each other To implement a new table format with composable design Reuse components Reimplement target components Add new features as plugin Solution Composable Design
  10. AI-ready Table Format One-way flow: data ⇒ AI AI-native Data

    Platform Two-way flow: data ⇔ AI Intent-driven data management Self-optimizing data flows Self-healing table maintenance The Road Ahead From AI-ready to AI-native