Slide 1

Slide 1 text

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. Iceberg V3 Spec I C E B E R G J A P A N M E E T U P O N 2 0 2 5 F E B . 2 1 Tomohiro Tanaka Senior Cloud Support Engineer Amazon Web Services

Slide 2

Slide 2 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Tomohiro Tanaka Senior Cloud Support Engineer, AWS Support, Amazon Web Services 2 • Responsible for solving most complex troubles and guiding best practices with Iceberg • Contributing to Iceberg OSS project

Slide 3

Slide 3 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. What is Apache Iceberg? Iceberg format version New features in V3 Spec Agenda 3

Slide 4

Slide 4 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. What is Apache Iceberg? 4

Slide 5

Slide 5 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. What is Apache Iceberg? An open table format designed for huge analytics datasets Built by Netflix (Ryan Blue and Daniel Weeks) in 2017 donated to the Apache Software Foundation in 2018 Offers ACID capabilities to data lakes Enables data analysis using familiar SQL syntax through processing engines Use-cases: streaming ingestions, complying with data regulations, CDC etc. 5

Slide 6

Slide 6 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Iceberg works between a computing engine and storage 6 Query engine layer Data Layer Iceberg Metadata Layer Amazon S3 Read/Write data through Iceberg with ACID transactions

Slide 7

Slide 7 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Iceberg table architecture Example: Run CREATE TABLE and INSERT INTO 7 Query engine Storage Catalog db - tbl bucket/ - path/warehouse - 00000-UUID.metadata.json - 00001-UUID.metadata.json - snap-SnapshotID-N-CommitUUID.avro - CommitUUID-mN.avro - data.zstd.parquet 1) CREATE TABLE 2) INSERT INTO

Slide 8

Slide 8 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Iceberg table architecture – 3 layers 8 The entry point of data access • Storing the current table metadata pointer as metadata_location • Providing atomic operations A key layer to provide the Iceberg features • 3 types of files such as Metadata file, Manifest lists, Manifest files Storing data as Parquet files (by default) Catalog Metadata Data Storage

Slide 9

Slide 9 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Iceberg table architecture – Write flow 1/2 9 Catalog Metadata Data db - tbl 00000-UUID.metadata.json data.zstd.parquet metadata_location Metadata file 3) Write data by query engine 1) Access the catalog 2) Check the current metadata location

Slide 10

Slide 10 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Iceberg table architecture – Write flow 2/2 10 Catalog Metadata Data db - tbl 00000-UUID.metadata.json 00001-UUID.metadata.json snap-SnapshotID-N-CommitUUID.avro CommitUUID-mN.avro data.zstd.parquet metadata_location Metadata file Manifest list Manifest file 4) Create the manifest files pointing to the data files 5) Create 6) Create 7) Switch the pointer

Slide 11

Slide 11 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Iceberg table architecture – Read flow 11 Catalog Metadata Data db - tbl 00000-UUID.metadata.json 00001-UUID.metadata.json snap-SnapshotID-N-CommitUUID.avro CommitUUID-mN.avro data.zstd.parquet metadata_location 1) Access to the catalog 2) Check the current metadata location 4) Read data by query engine 3) Read each file Metadata file Manifest list Manifest file

Slide 12

Slide 12 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Iceberg format version 12

Slide 13

Slide 13 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Iceberg is a "table format" The specification of Iceberg's table format is defined in https://iceberg.apache.org/spec/ • Specs for data types, reserved fields, partitioning, data deletion etc. Processing engines need to follow the specification for the implementation. • Processing engines: Apache Spark, Trino, Apache Flink, Apache Hive etc. 13

Slide 14

Slide 14 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. What is "format version"? Iceberg table format specification is versioned as format version. • When you create an Iceberg table, the default (or specified) format version is set to the Iceberg table metadata (metadata.json). • The current format version is V2, and V3 is under development. 14 $ cat 00000-1eb8c96e-f503-4ff9-b4e0-53cb3ede0116.metadata.json { "format-version" : 2, "table-uuid" : "eaf5dec9-7866-49a5-81c6-11af8f344e1f", "location" : "s3://bucket/iceberg-warehouse", "last-sequence-number" : 0, ...

Slide 15

Slide 15 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. How is the format-version is maintained? Format version is incremented when older version readers can't read new features correctly: • Non forward-compatibility: older readers cannot correctly read newer tables • backward-compatibility: newer readers can read older tables Spec for a specific version is fixed by the community vote process after major features spec and impl are complete. • (e.g.) Vote for V2 spec fix: https://lists.apache.org/thread/ws2gg52d124p7bx9jgrn3kctrtfgtltp 15

Slide 16

Slide 16 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Main concept and history in each format version V1: Analytic Data Table introducing Iceberg fundamental table mechanisms 16 V2: Row-level deletes • 2020 Apr. (in Iceberg 0.7.0): V2 spec and implementation started • 2021 Aug. (in Iceberg 0.12.0): V2 spec was finalized • 2023. Aug. (in Iceberg 1.4.0): V2 becomes the default version (PR: #8381) • See https://github.com/apache/iceberg/milestone/7 V3: Extend data types and metadata fields for new capabilities • 2021 Aug.: V3 spec discussion started after V2 spec was finalized • Spec has not finalized yet, and implementation has already started • See https://github.com/apache/iceberg/milestone/42

Slide 17

Slide 17 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. New features in Iceberg V3 17

Slide 18

Slide 18 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. New features in Iceberg V3 (as of 2025 Feb. 21) 18 Spec Impl Unknown #10955 #12012 Variant #10831 #11831 including for Spark Timestamp (9) w/ or w/o tz #8683 #9008 Geo (geometry/geography) #10981 #12346 (not merged) #12347 for Parquet (not merged) New type promotions #10955 Not yet Deletion Vectors #11240 Tracked in Issue #11122 Row Lineage #11130 #11948 (partially implemented) Default value (Issue #10761) #11785 (for Paquet), #11786 (for avro) #11803 (for Spark) Multi-args transform for partitioning and sorting #8579, plus #9661 Not yet Extended data types New capabilities

Slide 19

Slide 19 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. New features in Iceberg V3 19 Deletion Vectors

Slide 20

Slide 20 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Deletion Vectors Introduced in Iceberg V3 Files storing row-level deleted records corresponding data files Comparing to the V2 deletes: • Reduce write amplification than previo • Better compression efficiency than previous deletes • Faster query performance 20

Slide 21

Slide 21 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Row level deletes Assuming one parquet file containing the following records: 21 id drink price 1 milk 3 2 cocoa 4 3 espresso 5 DELETE FROM db.tbl WHERE id = 2

Slide 22

Slide 22 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. V1 deletes 22 metadata/ metadata.json files manifest lists, manifests data/ 00001-abcd.parquet 00002-efgh.parquet +---+--------+-----+ | id| drink|price| +---+--------+-----+ | 1| milk| 3| | 2| cocoa| 4| | 3|espresso| 5| +---+--------+-----+ +---+--------+-----+ | id| drink|price| +---+--------+-----+ | 1| milk| 3| | 3|espresso| 5| +---+--------+-----+

Slide 23

Slide 23 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Row level deletes in V2 V1 23 V2 metadata/ metadata.json files manifest lists, manifests data/ 00001-abcd.parquet 00002-efgh.parquet metadata/ metadata.json files manifest lists, manifests data/ 00000-abcd.parquet 00000-efgh-deletes.parquet Position delete

Slide 24

Slide 24 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Row level deletes in V2 V1 24 V2 metadata/ metadata.json files manifest lists, manifests data/ 00001-abcd.parquet 00002-efgh.parquet metadata/ metadata.json files manifest lists, manifests data/ 00000-abcd.parquet 00000-efgh-deletes.parquet +---+--------+-----+ | id| drink|price| +---+--------+-----+ | 1| milk| 3| | 2| cocoa| 4| | 3|espresso| 5| +---+--------+-----+ +--------------------------------------+---+ |file_path |pos| +--------------------------------------+---+ |s3://bucket/v2/data/00000-abcd.parquet|1 | +--------------------------------------+---+

Slide 25

Slide 25 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Row level deletes in V3 V2 25 V3 metadata/ metadata.json files manifest lists, manifests data/ 00000-abcd.parquet 00000-efgh-deletes.parquet metadata/ metadata.json files manifest lists, manifests data/ 00000-abcd.parquet 00000-abcd-deletes.puffin Deletion vector

Slide 26

Slide 26 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Row level deletes in V3 V2 26 V3 metadata/ metadata.json files manifest lists, manifests data/ 00000-abcd.parquet (1 KiB) 00000-efgh-deletes.parquet (1.5 KiB) metadata/ metadata.json files manifest lists, manifests data/ 00000-abcd.parquet (1 KiB) 00000-abcd-deletes.puffin (0.5 KiB)

Slide 27

Slide 27 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Puffin format and deletion vector 27 $ cat 00000-abcd-deletes.puffin PFA1 " 9d:0 PFA1 {"blobs":[{"type":"deletion-vector- v1","fields":[2147483645],"snapshot-id":- 1,"sequence-number":- 1,"offset":4,"length":42,"properties":{"referen ced-data-file":"s3://bucket/v3/data/00000-5086- bf58b80c-84ad-4b34-bbb3-02e144a39bcb-0- 00001.parquet","cardinality":"1"}}],"properties ":{"created-by":"Iceberg Apache Iceberg 1.8.0"}} PFA1 Magic number 0x50, 0x46, 0x41, 0x31 Blobs Footer Magic number 0x50, 0x46, 0x41, 0x31 FooterPayload (JSON) FooterPayloadSize (4 bytes) Magic number Puffin spec Roaring bitmap based serialization (for the position vector)

Slide 28

Slide 28 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Future of deletes In Iceberg V3, • position deletes are deprecated and replaced with deletion vectors • plus, equality deletes still exist, however they were discussed in the community about their deprecation in https://lists.apache.org/thread/6fhpjszsfxd8p0vfzc3k5vw7zmcyv2mq 28

Slide 29

Slide 29 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. New features in Iceberg V3 29 Row Lineage

Slide 30

Slide 30 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Row Lineage Tracks changes for rows in an Iceberg table • Use-cases: CDC, Incremental data processing, Audit logs, MV maintenance etc. Added in Iceberg 1.8.0 Enabled by setting row-lineage to true in the Iceberg table property Considerations: • Once enabling it, you cannot disable it. • If your Iceberg table includes equality deletes, you cannot enable it. • → Equality deletes are also allowed for Row Lineage (see the spec #12230) 30

Slide 31

Slide 31 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Row Lineage mechanism Adding the following field IDs to track the row changes: • _row_id: each row's identifier • _last_updated_sequence_number: the commit timing for a specific row And additional fields are used for the row checkpoints: • next-row-id: (= first-row-id + added_rows_count) the row checkpoint • first-row-id: (for now) stored in metadata • added/existing/deleted_rows_count: stored in manifest lists 31

Slide 32

Slide 32 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Summary • Iceberg is a table format that defines the specification of its features. • The specification is versioned as format-version. The current version is V2, and V3 is in development. • New features in Iceberg V3 introduces extending data types and new capabilities: • New data types such as unknow, variant, timestamp nano, geo • New capabilities such as deletion vectors, row lineage, multi-args transforming partitioning and sorting, default values etc. 32

Slide 33

Slide 33 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Questions 33

Slide 34

Slide 34 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. How are puffin files for Deletion Vectors updated? New puffin file is created with a new DV, and the previous DV is also added to the new puffin file. 34 Example: The current Iceberg table records and relevant metadata and data files +---+--------+-----+ |id |drink |price| +---+--------+-----+ |1 |milk |3 | |2 |cocoa |4 | |3 |espresso|5 | +---+--------+-----+ s3://bucket/iceberg-v3-spec/dv_puff/ - metadata/00000-89225cb5-ae9f-4f76-921b-d9977784303e.metadata.json - metadata/00001-749e89a3-208f-47a4-b377-fee2a4c7f868.metadata.json - metadata/907196cf-b487-4ee5-8261-46991c0cbb01-m0.avro - metadata/snap-992865481207666524-1-907196cf-b487-4ee5-8261-46991c0cbb01.avro - data/00000-3-896a8c5a-94dd-4c67-a040-f0a8dc455707-0-00001.parquet

Slide 35

Slide 35 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. How are puffin files for Deletion Vectors updated? New puffin file is created with a new DV, and the previous DV is also added to the new puffin file. 35 Example: Run DELETE FROM db.tbl WHERE id = 2 +---+--------+-----+ |id |drink |price| +---+--------+-----+ |1 |milk |3 | |2 |cocoa |4 | |3 |espresso|5 | +---+--------+-----+ s3://bucket/iceberg-v3-spec/dv_puff/ - metadata/00000-89225cb5-ae9f-4f76-921b-d9977784303e.metadata.json - metadata/00001-749e89a3-208f-47a4-b377-fee2a4c7f868.metadata.json - metadata/00002-80aef5fd-91a0-4622-9b16-8902f08ba32a.metadata.json - metadata/907196cf-b487-4ee5-8261-46991c0cbb01-m0.avro - metadata/8c81a87a-61ad-4084-8c2d-63b56a9bafff-m0.avro - metadata/snap-992865481207666524-1-907196cf-b487-4ee5-8261-46991c0cbb01.avro - metadata/snap-2280167229853693236-1-8c81a87a-61ad-4084-8c2d-63b56a9bafff.avro - data/00000-3-896a8c5a-94dd-4c67-a040-f0a8dc455707-0-00001.parquet - data/00000-7-8eaacff6-3a7c-4e08-9fb5-5355576005c9-00001-deletes.puffin DELETE FROM db.tbl WHERE id = 2

Slide 36

Slide 36 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. How are puffin files for Deletion Vectors updated? New puffin file is created with a new DV, and the previous DV is also added to the new puffin file. 36 Example: Run DELETE FROM db.tbl WHERE id = 1 +---+--------+-----+ |id |drink |price| +---+--------+-----+ |1 |milk |3 | |3 |espresso|5 | +---+--------+-----+ s3://bucket/iceberg-v3-spec/dv_puff/ - metadata/00000-89225cb5-ae9f-4f76-921b-d9977784303e.metadata.json - metadata/00001-749e89a3-208f-47a4-b377-fee2a4c7f868.metadata.json - metadata/00002-80aef5fd-91a0-4622-9b16-8902f08ba32a.metadata.json - metadata/00003-10439861-354c-4c79-a50d-77f6cd7fbd9b.metadata.json - metadata/907196cf-b487-4ee5-8261-46991c0cbb01-m0.avro - metadata/8c81a87a-61ad-4084-8c2d-63b56a9bafff-m0.avro - metadata/fae34f8f-862d-43f1-bea7-d794e65150ab-m0.avro - metadata/fae34f8f-862d-43f1-bea7-d794e65150ab-m1.avro - metadata/snap-992865481207666524-1-907196cf-b487-4ee5-8261-46991c0cbb01.avro - metadata/snap-2280167229853693236-1-8c81a87a-61ad-4084-8c2d-63b56a9bafff.avro - metadata/snap-4965038673004682275-1-fae34f8f-862d-43f1-bea7-d794e65150ab.avro - data/00000-3-896a8c5a-94dd-4c67-a040-f0a8dc455707-0-00001.parquet - data/00000-7-8eaacff6-3a7c-4e08-9fb5-5355576005c9-00001-deletes.puffin - data/00000-10-1d66d4f5-8ed9-4c97-ac98-9410134b5fbf-00001-deletes.puffin

Slide 37

Slide 37 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Manifest files New generated manifest files has the file paths for the puffin files. 37 avro-tools tojson fae34f8f-862d-43f1-bea7-d794e65150ab-m0.avro | jq ".data_file |{file_path: .file_path, file_format: .file_format, referenced_data_file: .referenced_data_file}" { "file_path": "s3://bucket/iceberg-v3-spec/dv_puff/data/00000-7-8eaacff6-3a7c-4e08-9fb5-5355576005c9-00001-deletes.puffin", "file_format": "PUFFIN", "referenced_data_file": { "string": "s3://bucket/iceberg-v3-spec/dv_puff/data/00000-3-896a8c5a-94dd-4c67-a040-f0a8dc455707-0-00001.parquet" } } avro-tools tojson fae34f8f-862d-43f1-bea7-d794e65150ab-m1.avro | jq ".data_file |{file_path: .file_path, file_format: .file_format, referenced_data_file: .referenced_data_file}" { "file_path": "s3://bucket/iceberg-v3-spec/dvpuff/data/00000-10-1d66d4f5-8ed9-4c97-ac98-9410134b5fbf-00001-deletes.puffin", "file_format": "PUFFIN", "referenced_data_file": { "string": "s3://bucket/iceberg-v3-spec/dv_puff/data/00000-3-896a8c5a-94dd-4c67-a040-f0a8dc455707-0-00001.parquet" } }

Slide 38

Slide 38 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Appendix 38

Slide 39

Slide 39 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Deletion vector blob 39 length (4 bytes) magic bytes (4 bytes) D1 D3 39 64 Vector Puffin spec CRC-32 checksum of the magic bytes and serialized vector (4 bytes) Serialized using the Roaring bitmap: • The number of 32-bit Roaring bitmaps (serialized as 8 bytes, LE) • For each 32-bit Roaring bitmap, order by unsigned comparison of the 32-bit key BE BE

Slide 40

Slide 40 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Row Lineage example 40 metadata: v0 next-row-id=0 metadata: v1 next-row-id=100 manifest list_v1 first-row-id=0 add_rows=75 add_rows=25 data_v1a.parquet _row_id _last_seq 0 1 … … 74 1 INSERT INTO 100 records data_v1b.parquet _row_id _last_seq 75 1 … … 99 1 manifest file_v1a data_v1a.parquet manifest file_v1b data_v1b.parquet