Slide 1

Slide 1 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Apache Iceberg V3 の最新状況と移⾏ A P A C H E I C E B E R G M E E T U P # 4 Tomohiro Tanaka Senior Cloud Support Engineer, AWS

Slide 2

Slide 2 text

© 2026, Amazon Web Services, Inc. or its affiliates. All rights reserved. Tomohiro Tanaka Senior Cloud Support Engineer, AWS Support Amazon Web Services • Responsible for solving most complex troubles and guiding best practices with Iceberg • Contributing to Apache Iceberg OSS project 2

Slide 3

Slide 3 text

© 2026, Amazon Web Services, Inc. or its affiliates. All rights reserved. Iceberg format versions Features in Iceberg V3 Spec Migration from V2 to V3 Quick intro: Iceberg V4 Spec Agenda 3

Slide 4

Slide 4 text

© 2026, Amazon Web Services, Inc. or its affiliates. All rights reserved. • Defines the specifications that Iceberg clients must follow • Not same as Iceberg library versions! (e.g. iceberg-1.10.1) • Incremented when older versions cannot read new features (Non–forward compatibility) • New features go into the next version after version spec is fixed. • Stored in the Iceberg metadata.json file 4 Iceberg format version

Slide 5

Slide 5 text

© 2026, Amazon Web Services, Inc. or its affiliates. All rights reserved. History of Iceberg format versions 5 Version 1 Iceberg's fundamental table mechanisms Version 2 Row-level deletes https://github.com/apache/iceberg/milestone/4 Version 3 (latest) Extending data types and capabilities https://github.com/apache/iceberg/milestone/42 Version 4 (under development) Adaptive table tree structure & Single File commits, Relative paths, Column stats improvements etc. https://github.com/apache/iceberg/milestone/58 Ref: https://github.com/apache/iceberg/milestones

Slide 6

Slide 6 text

© 2026, Amazon Web Services, Inc. or its affiliates. All rights reserved. Features in Iceberg V3 Spec Extended data types New capabilities Unknown Variant Timestamp (9) w/ or w/o TZ Geo (Geometry/Geography) New type promotions Deletion Vectors Row Lineage Default values Multi-arguments for transforms Table encryption keys 6

Slide 7

Slide 7 text

© 2026, Amazon Web Services, Inc. or its affiliates. All rights reserved. Features in Iceberg V3 Spec Extended data types New capabilities Unknown Variant Timestamp (9) w/ or w/o TZ Geo (Geometry/Geography) New type promotions Deletion Vectors Row Lineage Default values Multi-arguments for transforms Table encryption keys 7

Slide 8

Slide 8 text

© 2026, Amazon Web Services, Inc. or its affiliates. All rights reserved. Variant type 8

Slide 9

Slide 9 text

© 2026, Amazon Web Services, Inc. or its affiliates. All rights reserved. • Proposal: https://github.com/apache/iceberg/issues/10392 § doc: https://docs.google.com/document/d/1sq70XDiWJ2DemWyA5dVB80gKzwi0CWoM0LOWM7 VJVd8/ • Stores semi-structured data such as JSON, AVRO etc. within a single column, not as String • Enhances performance with binary encoding of semi-structured data compared to String • To get variant typed values, use variant_get function (in Spark) • Supported in Iceberg 1.10.0+ and Spark 4.0+ 9 Variant type

Slide 10

Slide 10 text

© 2026, Amazon Web Services, Inc. or its affiliates. All rights reserved. Example (1/3) – DDL (Spark SQL) CREATE TABLE variant (txid int, data variant) USING iceberg TBLPROPERTIES('format-version'='3'); DESCRIBE EXTENDED variant; 10

Slide 11

Slide 11 text

© 2026, Amazon Web Services, Inc. or its affiliates. All rights reserved. Example (2/3) - Write INSERT INTO variant VALUES (1, parse_json('{"device_id": "a001", "timestamp": "2026-01- 21T13:00:00", "color": "red"}')), (2, parse_json('{"device_id": "a002", "timestamp": "2026-01- 21T13:05:00", "color": "blue"}')), (3, parse_json('{"device_id": "a003", "timestamp": "2026-01- 21T13:10:00", "color": "green"}')) 11

Slide 12

Slide 12 text

© 2026, Amazon Web Services, Inc. or its affiliates. All rights reserved. Example (3/3) – Read a variant field SELECT variant_get(data, '$.device_id', 'string') as dev_id, variant_get(data, '$.color', 'string') as device_color FROM variant 12

Slide 13

Slide 13 text

© 2026, Amazon Web Services, Inc. or its affiliates. All rights reserved. Deletion Vectors 13

Slide 14

Slide 14 text

© 2026, Amazon Web Services, Inc. or its affiliates. All rights reserved. 67890-1-hijklmn-deletes.parquet 01234-5-abdcefg.parquet V2 Row-level deletes id drink price 1 milk 3.00 2 cocoa 4.00 3 espresso 5.00 … … … 100000 white mocha 6.00 DELETE FROM db.tbl WHERE id = 2 id drink price 1 milk 3.00 3 espresso 5.00 … … … 100000 white mocha 6.00 file_path pos s3://bucket/a.parquet 1 Only delete files are created. Create a new file without deleted records. File-level deletes Row-level deletes

Slide 15

Slide 15 text

© 2026, Amazon Web Services, Inc. or its affiliates. All rights reserved. • Proposal: https://github.com/apache/iceberg/issues/11122 § doc: https://docs.google.com/document/d/18Bqhr- vnzFfQk1S4AgRISkA_5_m5m32Nnc2Cw0zn2XM • Bitmap-based deletion markers • Enhanced "Row-level deletes" introduced in V2 • Requires merge-on-read write mode • Reduces delete file size and offers better read/write performance, comparing to V2 • Stored in a "Puffin" format file • Supported in Iceberg 1.8.0+ and Spark 3.5+ 15 Deletion Vectors

Slide 16

Slide 16 text

© 2026, Amazon Web Services, Inc. or its affiliates. All rights reserved. Puffin format Magic number 0x50, 0x46, 0x41, 0x31 Blobs Footer Magic number 0x50, 0x46, 0x41, 0x31 FooterPayload (JSON) FooterPayloadSize (4 bytes) Magic number Blob_0 Blob_1 … Blobs Blob_0 (type: deletion-vector-v1) … length (4 bytes) magic bytes (4 bytes) D1 D3 39 64 Deletion vector (Serialized by Roaring-bitmaps) CRC-32 checksum

Slide 17

Slide 17 text

© 2026, Amazon Web Services, Inc. or its affiliates. All rights reserved. Example (1/3) – table records (Spark SQL) SELECT * FROM review 17

Slide 18

Slide 18 text

© 2026, Amazon Web Services, Inc. or its affiliates. All rights reserved. Example (2/3) delete in V2 DELETE FROM review WHERE review_year <= 2015 18 /path/to/warehouse/data - 00000-229-ff2ba929-305a-4a39-a1d4-1ac7f513c1d4-0-00001.parquet - 00001-230-ff2ba929-305a-4a39-a1d4-1ac7f513c1d4-0-00001.parquet - 00002-231-ff2ba929-305a-4a39-a1d4-1ac7f513c1d4-0-00001.parquet - 00000-110-eb243d8e-bfd4-4b9c-833a-e1e041f07c1c-00001-deletes.parquet - 00001-111-eb243d8e-bfd4-4b9c-833a-e1e041f07c1c-00001-deletes.parquet - 00002-112-eb243d8e-bfd4-4b9c-833a-e1e041f07c1c-00001-deletes.parquet Tip: The file number of delete files can be configured by write.delete.granularity (default: file in Spark).

Slide 19

Slide 19 text

© 2026, Amazon Web Services, Inc. or its affiliates. All rights reserved. Example (3/3) delete in V3 DELETE FROM review WHERE review_year <= 2015 19 /path/to/warehouse/data - 00000-229-ff2ba929-305a-4a39-a1d4-1ac7f513c1d4-0-00001.parquet - 00001-230-ff2ba929-305a-4a39-a1d4-1ac7f513c1d4-0-00001.parquet - 00002-231-ff2ba929-305a-4a39-a1d4-1ac7f513c1d4-0-00001.parquet - 00000-110-eb243d8e-bfd4-4b9c-833a-e1e041f07c1c-00001-deletes.puffin - 00001-111-eb243d8e-bfd4-4b9c-833a-e1e041f07c1c-00001-deletes.puffin - 00002-112-eb243d8e-bfd4-4b9c-833a-e1e041f07c1c-00001-deletes.puffin

Slide 20

Slide 20 text

© 2026, Amazon Web Services, Inc. or its affiliates. All rights reserved. Comparison between v2 and v3 delete files 20 data files delete files in V2 delete files in V3

Slide 21

Slide 21 text

© 2026, Amazon Web Services, Inc. or its affiliates. All rights reserved. Migration from Iceberg V2 to V3 21

Slide 22

Slide 22 text

© 2026, Amazon Web Services, Inc. or its affiliates. All rights reserved. • Run: § ALTER TABLE db.tbl SET TBLPROPERTIES ('format-version'='3') (Spark SQL) • Update the table format version last. Update readers/writers, (and REST Catalog) first • Note that: § Your using computing engines have the implementation of V3 features. § Not possible to change back to older versions (e.g. V3 to V2) § Readers/Writers with newer versions only can read/write from/to Iceberg tables of older version. 22 How can I migrate Iceberg V2 to V3?

Slide 23

Slide 23 text

© 2026, Amazon Web Services, Inc. or its affiliates. All rights reserved. • Deletion Vectors: § Readers with V3 support can read V2 position delete files after V3 migration. § Once V3 delete occurs, the content of existing V2 position delete files will be moved into V3 Puffin files. § Once running rewrite_position_delete_files Spark procedure for a V3 table, the existing V2 position delete files will be merged to V3 Puffin files. 23 Other considerations for V3 migration

Slide 24

Slide 24 text

© 2026, Amazon Web Services, Inc. or its affiliates. All rights reserved. • Row Lineage: § After changing the version to V3, next-row-id is assigned, but it doesn't affect the current records. § Once table records are changed, _row_id is assigned for each row. _last_updated_sequence_number is inherited. 24 Other considerations for V3 migration

Slide 25

Slide 25 text

© 2026, Amazon Web Services, Inc. or its affiliates. All rights reserved. Quick intro: Iceberg V4 Spec 25

Slide 26

Slide 26 text

© 2026, Amazon Web Services, Inc. or its affiliates. All rights reserved. • V3 Spec has been fixed around the summer of 2025. • V4 Spec is currently under discussion. • V4 topics: § Single file commits & Adaptive metadata tree (the doc was merged) – proposal doc: https://s.apache.org/iceberg-single-file-commit – Learn: https://qiita.com/m-masataka/items/aa3de63618e2d48433a6 § Relative Path Spec (Issue#13141) – proposal doc: https://s.apache.org/iceberg-spec-relative-path § Column stats improvement (Issue#13153) – proposal doc: https://s.apache.org/iceberg-column-stats 26 Iceberg V4 Spec

Slide 27

Slide 27 text

© 2026, Amazon Web Services, Inc. or its affiliates. All rights reserved. Summary 27

Slide 28

Slide 28 text

© 2026, Amazon Web Services, Inc. or its affiliates. All rights reserved. • Iceberg format versions and evolution of Iceberg • Some features in V3 are still in development in each engine. • Variant type enables performance optimization to process semi-structured values. • Deletion Vectors can reduce storage cost, and help the enhancement iof read/write performance. • Migration steps to V3 (readers/writers, then format version), and considerations • Iceberg V4 Spec 28 Key takeaways

Slide 29

Slide 29 text

© 2026, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thank you! 29