Iceberg Meetup Japan #1 : Iceberg and Databricks

Databricks Frozen Over Victoria Bukta - Product @ Databricks Feb
19, 2025 Bringing the best of data catalogs and Iceberg together

Agenda • Hey! I’m Victoria • Data Catalogs • Unity
Catalog • Iceberg at Databricks • Vision + Mission 2 Databricks Frozen Over

3 The messy realities of data platforms Data Catalogs

Does your data lake look like this? • Files in
object storage • Multiple apps • Producing, accessing, and processing data • Multiple use cases • Data engineering • ML / AI • Reporting 4 Lakehouse - Overview

What does your organization look like? • Discovery ◦ How
do you ﬁnd the data you need? • Access ◦ How do I gain access to data? • Observability ◦ Who is accessing data? ◦ What is accessed + how? • Lineage ◦ How was this data produced? 5 Lakehouse - Zooming in

Then came Hive Metastore 6

Introducing Hive Metastore • Discovery • You can now look
through hive and see your tables • Access • Tells me the following so that know how to interact with the table • Location • Format • Schema • IAM permissions required to access to storage locations 7 Solving how engines ﬁnd out datasets

8 What about data catalogs and Iceberg?

Data Catalogs with Iceberg Directory based catalog Server-side catalog that’s
exposed through a REST API • Single client to talk to any custom catalog backend. • Shifting responsibility from client to catalog ◦ Metadata ﬁle generation Hadoop Hive / JDBC/ Nessie / etc. Iceberg REST 9 Different Catalog Implementations https://iceberg.apache.org/concepts/catalog/?h=catalog#overview spark .read .format("iceberg") .load( "hdfs://host:8020/catalog/schema /table");

Multiple Catalogs!

Metadata silos lead to fragmented discovery, governance, auditing, and lineage
11

12 The Multi-format, Multimodal, Unified Catalog Unity Catalog

What do we want? Manage data and AI assets in
one place Govern assets through a single source of truth Leverage best-of-breed tools with your data 1 2 3

Unity Catalog Multimodal Universal catalog for tabular, non-tabular data and
AI assets Multi-format Support any table format - incl Delta, Iceberg, Parquet, CSV, JSON Uniﬁed Single catalog which can govern access across your entire data estate

Functions ML Models Volumes Vector DBs Delta Iceberg Hudi OPEN
CATALOG Tables Objects AI / ML Microsoft Fabric Google Cloud ENGINES AND PLATFORMS LlamaIndex Image Audio PDF Parquet CSV JSON Open Lakehouse for Data + AI Governance Discovery Lineage Observability Tables Views

16 Bring your data under one roof Iceberg At Databricks
with Unity Catalog

Delta Clients Iceberg REST Unity REST Federation & Mirroring Glue
Horizon REST HMS Unify the lakehouse with Databricks Unity Catalog Write and read from any Iceberg client using open APIs (Unity or Iceberg REST) Access and govern data in Foreign Catalogs from Unity Catalog (and vice-versa) Iceberg Clients

18 Govern all Delta, Iceberg, and legacy formats (ex: Parquet,
CSV, JSON) AI-driven predictive optimizations on all managed tables • File Compaction, Snapshot expiry, etc. Break format silos with Databricks Unity Catalog Spark Trino Flink Create table Read table Snowﬂake DBX Iceberg REST Unity REST Iceberg REST Delta Lake Iceberg AI-driven Predictive Optimization

Long-term vision of Delta and Iceberg Delta Lake Iceberg Format
Unification • Partnership with the Delta and Iceberg communities to unify the formats • Consistent data and delete files for flexibility and performance • Aligned table features to track row-level changes between versions of a table

Iceberg Meetup Japan #1 : Iceberg and Databricks

Iceberg Meetup Japan #1 : Iceberg and Databricks

Databricks Japan

More Decks by Databricks Japan

Other Decks in Technology

Featured

Transcript

Databricks Frozen Over Victoria Bukta - Product @ Databricks Feb

Agenda • Hey! I’m Victoria • Data Catalogs • Unity

3 The messy realities of data platforms Data Catalogs

Does your data lake look like this? • Files in

What does your organization look like? • Discovery ◦ How

Then came Hive Metastore 6

Introducing Hive Metastore • Discovery • You can now look

8 What about data catalogs and Iceberg?

Data Catalogs with Iceberg Directory based catalog Server-side catalog that’s

Multiple Catalogs!

Metadata silos lead to fragmented discovery, governance, auditing, and lineage

12 The Multi-format, Multimodal, Unified Catalog Unity Catalog

What do we want? Manage data and AI assets in

Unity Catalog Multimodal Universal catalog for tabular, non-tabular data and

Functions ML Models Volumes Vector DBs Delta Iceberg Hudi OPEN

16 Bring your data under one roof Iceberg At Databricks

Delta Clients Iceberg REST Unity REST Federation & Mirroring Glue

18 Govern all Delta, Iceberg, and legacy formats (ex: Parquet,

Long-term vision of Delta and Iceberg Delta Lake Iceberg Format