Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Apache Iceberg V3 and migration to V3

Sponsored · Ship Features Fearlessly Turn features on and off without deploys. Used by thousands of Ruby developers.

Apache Iceberg V3 and migration to V3

Avatar for Tomohiro Tanaka

Tomohiro Tanaka

January 27, 2026
Tweet

More Decks by Tomohiro Tanaka

Other Decks in Programming

Transcript

  1. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Apache Iceberg V3 の最新状況と移⾏ A P A C H E I C E B E R G M E E T U P # 4 Tomohiro Tanaka Senior Cloud Support Engineer, AWS
  2. © 2026, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Tomohiro Tanaka Senior Cloud Support Engineer, AWS Support Amazon Web Services • Responsible for solving most complex troubles and guiding best practices with Iceberg • Contributing to Apache Iceberg OSS project 2
  3. © 2026, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Iceberg format versions Features in Iceberg V3 Spec Migration from V2 to V3 Quick intro: Iceberg V4 Spec Agenda 3
  4. © 2026, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. • Defines the specifications that Iceberg clients must follow • Not same as Iceberg library versions! (e.g. iceberg-1.10.1) • Incremented when older versions cannot read new features (Non–forward compatibility) • New features go into the next version after version spec is fixed. • Stored in the Iceberg metadata.json file 4 Iceberg format version
  5. © 2026, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. History of Iceberg format versions 5 Version 1 Iceberg's fundamental table mechanisms Version 2 Row-level deletes https://github.com/apache/iceberg/milestone/4 Version 3 (latest) Extending data types and capabilities https://github.com/apache/iceberg/milestone/42 Version 4 (under development) Adaptive table tree structure & Single File commits, Relative paths, Column stats improvements etc. https://github.com/apache/iceberg/milestone/58 Ref: https://github.com/apache/iceberg/milestones
  6. © 2026, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Features in Iceberg V3 Spec Extended data types New capabilities Unknown Variant Timestamp (9) w/ or w/o TZ Geo (Geometry/Geography) New type promotions Deletion Vectors Row Lineage Default values Multi-arguments for transforms Table encryption keys 6
  7. © 2026, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Features in Iceberg V3 Spec Extended data types New capabilities Unknown Variant Timestamp (9) w/ or w/o TZ Geo (Geometry/Geography) New type promotions Deletion Vectors Row Lineage Default values Multi-arguments for transforms Table encryption keys 7
  8. © 2026, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. • Proposal: https://github.com/apache/iceberg/issues/10392 § doc: https://docs.google.com/document/d/1sq70XDiWJ2DemWyA5dVB80gKzwi0CWoM0LOWM7 VJVd8/ • Stores semi-structured data such as JSON, AVRO etc. within a single column, not as String • Enhances performance with binary encoding of semi-structured data compared to String • To get variant typed values, use variant_get function (in Spark) • Supported in Iceberg 1.10.0+ and Spark 4.0+ 9 Variant type
  9. © 2026, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Example (1/3) – DDL (Spark SQL) CREATE TABLE variant (txid int, data variant) USING iceberg TBLPROPERTIES('format-version'='3'); DESCRIBE EXTENDED variant; 10
  10. © 2026, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Example (2/3) - Write INSERT INTO variant VALUES (1, parse_json('{"device_id": "a001", "timestamp": "2026-01- 21T13:00:00", "color": "red"}')), (2, parse_json('{"device_id": "a002", "timestamp": "2026-01- 21T13:05:00", "color": "blue"}')), (3, parse_json('{"device_id": "a003", "timestamp": "2026-01- 21T13:10:00", "color": "green"}')) 11
  11. © 2026, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Example (3/3) – Read a variant field SELECT variant_get(data, '$.device_id', 'string') as dev_id, variant_get(data, '$.color', 'string') as device_color FROM variant 12
  12. © 2026, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Deletion Vectors 13
  13. © 2026, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. 67890-1-hijklmn-deletes.parquet 01234-5-abdcefg.parquet V2 Row-level deletes id drink price 1 milk 3.00 2 cocoa 4.00 3 espresso 5.00 … … … 100000 white mocha 6.00 DELETE FROM db.tbl WHERE id = 2 id drink price 1 milk 3.00 3 espresso 5.00 … … … 100000 white mocha 6.00 file_path pos s3://bucket/a.parquet 1 Only delete files are created. Create a new file without deleted records. File-level deletes Row-level deletes
  14. © 2026, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. • Proposal: https://github.com/apache/iceberg/issues/11122 § doc: https://docs.google.com/document/d/18Bqhr- vnzFfQk1S4AgRISkA_5_m5m32Nnc2Cw0zn2XM • Bitmap-based deletion markers • Enhanced "Row-level deletes" introduced in V2 • Requires merge-on-read write mode • Reduces delete file size and offers better read/write performance, comparing to V2 • Stored in a "Puffin" format file • Supported in Iceberg 1.8.0+ and Spark 3.5+ 15 Deletion Vectors
  15. © 2026, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Puffin format Magic number 0x50, 0x46, 0x41, 0x31 Blobs Footer Magic number 0x50, 0x46, 0x41, 0x31 FooterPayload (JSON) FooterPayloadSize (4 bytes) Magic number Blob_0 Blob_1 … Blobs Blob_0 (type: deletion-vector-v1) … length (4 bytes) magic bytes (4 bytes) D1 D3 39 64 Deletion vector (Serialized by Roaring-bitmaps) CRC-32 checksum
  16. © 2026, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Example (1/3) – table records (Spark SQL) SELECT * FROM review 17
  17. © 2026, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Example (2/3) delete in V2 DELETE FROM review WHERE review_year <= 2015 18 /path/to/warehouse/data - 00000-229-ff2ba929-305a-4a39-a1d4-1ac7f513c1d4-0-00001.parquet - 00001-230-ff2ba929-305a-4a39-a1d4-1ac7f513c1d4-0-00001.parquet - 00002-231-ff2ba929-305a-4a39-a1d4-1ac7f513c1d4-0-00001.parquet - 00000-110-eb243d8e-bfd4-4b9c-833a-e1e041f07c1c-00001-deletes.parquet - 00001-111-eb243d8e-bfd4-4b9c-833a-e1e041f07c1c-00001-deletes.parquet - 00002-112-eb243d8e-bfd4-4b9c-833a-e1e041f07c1c-00001-deletes.parquet Tip: The file number of delete files can be configured by write.delete.granularity (default: file in Spark).
  18. © 2026, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Example (3/3) delete in V3 DELETE FROM review WHERE review_year <= 2015 19 /path/to/warehouse/data - 00000-229-ff2ba929-305a-4a39-a1d4-1ac7f513c1d4-0-00001.parquet - 00001-230-ff2ba929-305a-4a39-a1d4-1ac7f513c1d4-0-00001.parquet - 00002-231-ff2ba929-305a-4a39-a1d4-1ac7f513c1d4-0-00001.parquet - 00000-110-eb243d8e-bfd4-4b9c-833a-e1e041f07c1c-00001-deletes.puffin - 00001-111-eb243d8e-bfd4-4b9c-833a-e1e041f07c1c-00001-deletes.puffin - 00002-112-eb243d8e-bfd4-4b9c-833a-e1e041f07c1c-00001-deletes.puffin
  19. © 2026, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Comparison between v2 and v3 delete files 20 data files delete files in V2 delete files in V3
  20. © 2026, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Migration from Iceberg V2 to V3 21
  21. © 2026, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. • Run: § ALTER TABLE db.tbl SET TBLPROPERTIES ('format-version'='3') (Spark SQL) • Update the table format version last. Update readers/writers, (and REST Catalog) first • Note that: § Your using computing engines have the implementation of V3 features. § Not possible to change back to older versions (e.g. V3 to V2) § Readers/Writers with newer versions only can read/write from/to Iceberg tables of older version. 22 How can I migrate Iceberg V2 to V3?
  22. © 2026, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. • Deletion Vectors: § Readers with V3 support can read V2 position delete files after V3 migration. § Once V3 delete occurs, the content of existing V2 position delete files will be moved into V3 Puffin files. § Once running rewrite_position_delete_files Spark procedure for a V3 table, the existing V2 position delete files will be merged to V3 Puffin files. 23 Other considerations for V3 migration
  23. © 2026, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. • Row Lineage: § After changing the version to V3, next-row-id is assigned, but it doesn't affect the current records. § Once table records are changed, _row_id is assigned for each row. _last_updated_sequence_number is inherited. 24 Other considerations for V3 migration
  24. © 2026, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Quick intro: Iceberg V4 Spec 25
  25. © 2026, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. • V3 Spec has been fixed around the summer of 2025. • V4 Spec is currently under discussion. • V4 topics: § Single file commits & Adaptive metadata tree (the doc was merged) – proposal doc: https://s.apache.org/iceberg-single-file-commit – Learn: https://qiita.com/m-masataka/items/aa3de63618e2d48433a6 § Relative Path Spec (Issue#13141) – proposal doc: https://s.apache.org/iceberg-spec-relative-path § Column stats improvement (Issue#13153) – proposal doc: https://s.apache.org/iceberg-column-stats 26 Iceberg V4 Spec
  26. © 2026, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. • Iceberg format versions and evolution of Iceberg • Some features in V3 are still in development in each engine. • Variant type enables performance optimization to process semi-structured values. • Deletion Vectors can reduce storage cost, and help the enhancement iof read/write performance. • Migration steps to V3 (readers/writers, then format version), and considerations • Iceberg V4 Spec 28 Key takeaways