Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Apache Iceberg Meetup in Japan #1 - Iceberg V3 ...

Tomohiro Tanaka
February 25, 2025
130

Apache Iceberg Meetup in Japan #1 - Iceberg V3 Spec

Content:
* What is Apache Iceberg?
* Iceberg format version
* New features in V3 Spec
* and, Deletion Vectors and Row Lineage

Tomohiro Tanaka

February 25, 2025
Tweet

Transcript

  1. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Iceberg V3 Spec I C E B E R G J A P A N M E E T U P O N 2 0 2 5 F E B . 2 1 Tomohiro Tanaka Senior Cloud Support Engineer Amazon Web Services
  2. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Tomohiro Tanaka Senior Cloud Support Engineer, AWS Support, Amazon Web Services 2 • Responsible for solving most complex troubles and guiding best practices with Iceberg • Contributing to Iceberg OSS project
  3. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. What is Apache Iceberg? Iceberg format version New features in V3 Spec Agenda 3
  4. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. What is Apache Iceberg? 4
  5. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. What is Apache Iceberg? An open table format designed for huge analytics datasets Built by Netflix (Ryan Blue and Daniel Weeks) in 2017 donated to the Apache Software Foundation in 2018 Offers ACID capabilities to data lakes Enables data analysis using familiar SQL syntax through processing engines Use-cases: streaming ingestions, complying with data regulations, CDC etc. 5
  6. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Iceberg works between a computing engine and storage 6 Query engine layer Data Layer Iceberg Metadata Layer Amazon S3 Read/Write data through Iceberg with ACID transactions
  7. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Iceberg table architecture Example: Run CREATE TABLE and INSERT INTO 7 Query engine Storage Catalog db - tbl bucket/ - path/warehouse - 00000-UUID.metadata.json - 00001-UUID.metadata.json - snap-SnapshotID-N-CommitUUID.avro - CommitUUID-mN.avro - data.zstd.parquet 1) CREATE TABLE 2) INSERT INTO
  8. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Iceberg table architecture – 3 layers 8 The entry point of data access • Storing the current table metadata pointer as metadata_location • Providing atomic operations A key layer to provide the Iceberg features • 3 types of files such as Metadata file, Manifest lists, Manifest files Storing data as Parquet files (by default) Catalog Metadata Data Storage
  9. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Iceberg table architecture – Write flow 1/2 9 Catalog Metadata Data db - tbl 00000-UUID.metadata.json data.zstd.parquet metadata_location Metadata file 3) Write data by query engine 1) Access the catalog 2) Check the current metadata location
  10. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Iceberg table architecture – Write flow 2/2 10 Catalog Metadata Data db - tbl 00000-UUID.metadata.json 00001-UUID.metadata.json snap-SnapshotID-N-CommitUUID.avro CommitUUID-mN.avro data.zstd.parquet metadata_location Metadata file Manifest list Manifest file 4) Create the manifest files pointing to the data files 5) Create 6) Create 7) Switch the pointer
  11. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Iceberg table architecture – Read flow 11 Catalog Metadata Data db - tbl 00000-UUID.metadata.json 00001-UUID.metadata.json snap-SnapshotID-N-CommitUUID.avro CommitUUID-mN.avro data.zstd.parquet metadata_location 1) Access to the catalog 2) Check the current metadata location 4) Read data by query engine 3) Read each file Metadata file Manifest list Manifest file
  12. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Iceberg format version 12
  13. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Iceberg is a "table format" The specification of Iceberg's table format is defined in https://iceberg.apache.org/spec/ • Specs for data types, reserved fields, partitioning, data deletion etc. Processing engines need to follow the specification for the implementation. • Processing engines: Apache Spark, Trino, Apache Flink, Apache Hive etc. 13
  14. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. What is "format version"? Iceberg table format specification is versioned as format version. • When you create an Iceberg table, the default (or specified) format version is set to the Iceberg table metadata (metadata.json). • The current format version is V2, and V3 is under development. 14 $ cat 00000-1eb8c96e-f503-4ff9-b4e0-53cb3ede0116.metadata.json { "format-version" : 2, "table-uuid" : "eaf5dec9-7866-49a5-81c6-11af8f344e1f", "location" : "s3://bucket/iceberg-warehouse", "last-sequence-number" : 0, ...
  15. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. How is the format-version is maintained? Format version is incremented when older version readers can't read new features correctly: • Non forward-compatibility: older readers cannot correctly read newer tables • backward-compatibility: newer readers can read older tables Spec for a specific version is fixed by the community vote process after major features spec and impl are complete. • (e.g.) Vote for V2 spec fix: https://lists.apache.org/thread/ws2gg52d124p7bx9jgrn3kctrtfgtltp 15
  16. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Main concept and history in each format version V1: Analytic Data Table introducing Iceberg fundamental table mechanisms 16 V2: Row-level deletes • 2020 Apr. (in Iceberg 0.7.0): V2 spec and implementation started • 2021 Aug. (in Iceberg 0.12.0): V2 spec was finalized • 2023. Aug. (in Iceberg 1.4.0): V2 becomes the default version (PR: #8381) • See https://github.com/apache/iceberg/milestone/7 V3: Extend data types and metadata fields for new capabilities • 2021 Aug.: V3 spec discussion started after V2 spec was finalized • Spec has not finalized yet, and implementation has already started • See https://github.com/apache/iceberg/milestone/42
  17. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. New features in Iceberg V3 17
  18. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. New features in Iceberg V3 (as of 2025 Feb. 21) 18 Spec Impl Unknown #10955 #12012 Variant #10831 #11831 including for Spark Timestamp (9) w/ or w/o tz #8683 #9008 Geo (geometry/geography) #10981 #12346 (not merged) #12347 for Parquet (not merged) New type promotions #10955 Not yet Deletion Vectors #11240 Tracked in Issue #11122 Row Lineage #11130 #11948 (partially implemented) Default value (Issue #10761) #11785 (for Paquet), #11786 (for avro) #11803 (for Spark) Multi-args transform for partitioning and sorting #8579, plus #9661 Not yet Extended data types New capabilities
  19. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. New features in Iceberg V3 19 Deletion Vectors
  20. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Deletion Vectors Introduced in Iceberg V3 Files storing row-level deleted records corresponding data files Comparing to the V2 deletes: • Reduce write amplification than previo • Better compression efficiency than previous deletes • Faster query performance 20
  21. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Row level deletes Assuming one parquet file containing the following records: 21 id drink price 1 milk 3 2 cocoa 4 3 espresso 5 DELETE FROM db.tbl WHERE id = 2
  22. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. V1 deletes 22 metadata/ metadata.json files manifest lists, manifests data/ 00001-abcd.parquet 00002-efgh.parquet +---+--------+-----+ | id| drink|price| +---+--------+-----+ | 1| milk| 3| | 2| cocoa| 4| | 3|espresso| 5| +---+--------+-----+ +---+--------+-----+ | id| drink|price| +---+--------+-----+ | 1| milk| 3| | 3|espresso| 5| +---+--------+-----+
  23. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Row level deletes in V2 V1 23 V2 metadata/ metadata.json files manifest lists, manifests data/ 00001-abcd.parquet 00002-efgh.parquet metadata/ metadata.json files manifest lists, manifests data/ 00000-abcd.parquet 00000-efgh-deletes.parquet Position delete
  24. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Row level deletes in V2 V1 24 V2 metadata/ metadata.json files manifest lists, manifests data/ 00001-abcd.parquet 00002-efgh.parquet metadata/ metadata.json files manifest lists, manifests data/ 00000-abcd.parquet 00000-efgh-deletes.parquet +---+--------+-----+ | id| drink|price| +---+--------+-----+ | 1| milk| 3| | 2| cocoa| 4| | 3|espresso| 5| +---+--------+-----+ +--------------------------------------+---+ |file_path |pos| +--------------------------------------+---+ |s3://bucket/v2/data/00000-abcd.parquet|1 | +--------------------------------------+---+
  25. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Row level deletes in V3 V2 25 V3 metadata/ metadata.json files manifest lists, manifests data/ 00000-abcd.parquet 00000-efgh-deletes.parquet metadata/ metadata.json files manifest lists, manifests data/ 00000-abcd.parquet 00000-abcd-deletes.puffin Deletion vector
  26. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Row level deletes in V3 V2 26 V3 metadata/ metadata.json files manifest lists, manifests data/ 00000-abcd.parquet (1 KiB) 00000-efgh-deletes.parquet (1.5 KiB) metadata/ metadata.json files manifest lists, manifests data/ 00000-abcd.parquet (1 KiB) 00000-abcd-deletes.puffin (0.5 KiB)
  27. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Puffin format and deletion vector 27 $ cat 00000-abcd-deletes.puffin PFA1 " 9d:0 PFA1 {"blobs":[{"type":"deletion-vector- v1","fields":[2147483645],"snapshot-id":- 1,"sequence-number":- 1,"offset":4,"length":42,"properties":{"referen ced-data-file":"s3://bucket/v3/data/00000-5086- bf58b80c-84ad-4b34-bbb3-02e144a39bcb-0- 00001.parquet","cardinality":"1"}}],"properties ":{"created-by":"Iceberg Apache Iceberg 1.8.0"}} PFA1 Magic number 0x50, 0x46, 0x41, 0x31 Blobs Footer Magic number 0x50, 0x46, 0x41, 0x31 FooterPayload (JSON) FooterPayloadSize (4 bytes) Magic number Puffin spec Roaring bitmap based serialization (for the position vector)
  28. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Future of deletes In Iceberg V3, • position deletes are deprecated and replaced with deletion vectors • plus, equality deletes still exist, however they were discussed in the community about their deprecation in https://lists.apache.org/thread/6fhpjszsfxd8p0vfzc3k5vw7zmcyv2mq 28
  29. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. New features in Iceberg V3 29 Row Lineage
  30. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Row Lineage Tracks changes for rows in an Iceberg table • Use-cases: CDC, Incremental data processing, Audit logs, MV maintenance etc. Added in Iceberg 1.8.0 Enabled by setting row-lineage to true in the Iceberg table property Considerations: • Once enabling it, you cannot disable it. • If your Iceberg table includes equality deletes, you cannot enable it. • → Equality deletes are also allowed for Row Lineage (see the spec #12230) 30
  31. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Row Lineage mechanism Adding the following field IDs to track the row changes: • _row_id: each row's identifier • _last_updated_sequence_number: the commit timing for a specific row And additional fields are used for the row checkpoints: • next-row-id: (= first-row-id + added_rows_count) the row checkpoint • first-row-id: (for now) stored in metadata • added/existing/deleted_rows_count: stored in manifest lists 31
  32. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Summary • Iceberg is a table format that defines the specification of its features. • The specification is versioned as format-version. The current version is V2, and V3 is in development. • New features in Iceberg V3 introduces extending data types and new capabilities: • New data types such as unknow, variant, timestamp nano, geo • New capabilities such as deletion vectors, row lineage, multi-args transforming partitioning and sorting, default values etc. 32
  33. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. How are puffin files for Deletion Vectors updated? New puffin file is created with a new DV, and the previous DV is also added to the new puffin file. 34 Example: The current Iceberg table records and relevant metadata and data files +---+--------+-----+ |id |drink |price| +---+--------+-----+ |1 |milk |3 | |2 |cocoa |4 | |3 |espresso|5 | +---+--------+-----+ s3://bucket/iceberg-v3-spec/dv_puff/ - metadata/00000-89225cb5-ae9f-4f76-921b-d9977784303e.metadata.json - metadata/00001-749e89a3-208f-47a4-b377-fee2a4c7f868.metadata.json - metadata/907196cf-b487-4ee5-8261-46991c0cbb01-m0.avro - metadata/snap-992865481207666524-1-907196cf-b487-4ee5-8261-46991c0cbb01.avro - data/00000-3-896a8c5a-94dd-4c67-a040-f0a8dc455707-0-00001.parquet
  34. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. How are puffin files for Deletion Vectors updated? New puffin file is created with a new DV, and the previous DV is also added to the new puffin file. 35 Example: Run DELETE FROM db.tbl WHERE id = 2 +---+--------+-----+ |id |drink |price| +---+--------+-----+ |1 |milk |3 | |2 |cocoa |4 | |3 |espresso|5 | +---+--------+-----+ s3://bucket/iceberg-v3-spec/dv_puff/ - metadata/00000-89225cb5-ae9f-4f76-921b-d9977784303e.metadata.json - metadata/00001-749e89a3-208f-47a4-b377-fee2a4c7f868.metadata.json - metadata/00002-80aef5fd-91a0-4622-9b16-8902f08ba32a.metadata.json - metadata/907196cf-b487-4ee5-8261-46991c0cbb01-m0.avro - metadata/8c81a87a-61ad-4084-8c2d-63b56a9bafff-m0.avro - metadata/snap-992865481207666524-1-907196cf-b487-4ee5-8261-46991c0cbb01.avro - metadata/snap-2280167229853693236-1-8c81a87a-61ad-4084-8c2d-63b56a9bafff.avro - data/00000-3-896a8c5a-94dd-4c67-a040-f0a8dc455707-0-00001.parquet - data/00000-7-8eaacff6-3a7c-4e08-9fb5-5355576005c9-00001-deletes.puffin DELETE FROM db.tbl WHERE id = 2
  35. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. How are puffin files for Deletion Vectors updated? New puffin file is created with a new DV, and the previous DV is also added to the new puffin file. 36 Example: Run DELETE FROM db.tbl WHERE id = 1 +---+--------+-----+ |id |drink |price| +---+--------+-----+ |1 |milk |3 | |3 |espresso|5 | +---+--------+-----+ s3://bucket/iceberg-v3-spec/dv_puff/ - metadata/00000-89225cb5-ae9f-4f76-921b-d9977784303e.metadata.json - metadata/00001-749e89a3-208f-47a4-b377-fee2a4c7f868.metadata.json - metadata/00002-80aef5fd-91a0-4622-9b16-8902f08ba32a.metadata.json - metadata/00003-10439861-354c-4c79-a50d-77f6cd7fbd9b.metadata.json - metadata/907196cf-b487-4ee5-8261-46991c0cbb01-m0.avro - metadata/8c81a87a-61ad-4084-8c2d-63b56a9bafff-m0.avro - metadata/fae34f8f-862d-43f1-bea7-d794e65150ab-m0.avro - metadata/fae34f8f-862d-43f1-bea7-d794e65150ab-m1.avro - metadata/snap-992865481207666524-1-907196cf-b487-4ee5-8261-46991c0cbb01.avro - metadata/snap-2280167229853693236-1-8c81a87a-61ad-4084-8c2d-63b56a9bafff.avro - metadata/snap-4965038673004682275-1-fae34f8f-862d-43f1-bea7-d794e65150ab.avro - data/00000-3-896a8c5a-94dd-4c67-a040-f0a8dc455707-0-00001.parquet - data/00000-7-8eaacff6-3a7c-4e08-9fb5-5355576005c9-00001-deletes.puffin - data/00000-10-1d66d4f5-8ed9-4c97-ac98-9410134b5fbf-00001-deletes.puffin
  36. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Manifest files New generated manifest files has the file paths for the puffin files. 37 avro-tools tojson fae34f8f-862d-43f1-bea7-d794e65150ab-m0.avro | jq ".data_file |{file_path: .file_path, file_format: .file_format, referenced_data_file: .referenced_data_file}" { "file_path": "s3://bucket/iceberg-v3-spec/dv_puff/data/00000-7-8eaacff6-3a7c-4e08-9fb5-5355576005c9-00001-deletes.puffin", "file_format": "PUFFIN", "referenced_data_file": { "string": "s3://bucket/iceberg-v3-spec/dv_puff/data/00000-3-896a8c5a-94dd-4c67-a040-f0a8dc455707-0-00001.parquet" } } avro-tools tojson fae34f8f-862d-43f1-bea7-d794e65150ab-m1.avro | jq ".data_file |{file_path: .file_path, file_format: .file_format, referenced_data_file: .referenced_data_file}" { "file_path": "s3://bucket/iceberg-v3-spec/dvpuff/data/00000-10-1d66d4f5-8ed9-4c97-ac98-9410134b5fbf-00001-deletes.puffin", "file_format": "PUFFIN", "referenced_data_file": { "string": "s3://bucket/iceberg-v3-spec/dv_puff/data/00000-3-896a8c5a-94dd-4c67-a040-f0a8dc455707-0-00001.parquet" } }
  37. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Deletion vector blob 39 length (4 bytes) magic bytes (4 bytes) D1 D3 39 64 Vector Puffin spec CRC-32 checksum of the magic bytes and serialized vector (4 bytes) Serialized using the Roaring bitmap: • The number of 32-bit Roaring bitmaps (serialized as 8 bytes, LE) • For each 32-bit Roaring bitmap, order by unsigned comparison of the 32-bit key BE BE
  38. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Row Lineage example 40 metadata: v0 next-row-id=0 metadata: v1 next-row-id=100 manifest list_v1 first-row-id=0 add_rows=75 add_rows=25 data_v1a.parquet _row_id _last_seq 0 1 … … 74 1 INSERT INTO 100 records data_v1b.parquet _row_id _last_seq 75 1 … … 99 1 manifest file_v1a data_v1a.parquet manifest file_v1b data_v1b.parquet