• Most of our existing scientific data formats (e.g. HDF5, FITS, ROOT,
etc.) are NOT cloud-optimized (inefficient access on object storage)
• Adopting CO formats (e.g. Parquet, Zarr) is confusing to users and
data providers
• Transcoding legacy data to ARCO format can be tedious and
complicated
• Some clever hacks, e.g. kerchunk: https://github.com/fsspec/kerchunk
C h a l l e n g e : L e g a c y D ata F o r m at s
45