$30 off During Our Annual Pro Sale. View Details »

Cubed-xarray lightning talk at SciPy2023

Cubed-xarray lightning talk at SciPy2023

Short 4-minute talk on using the Cubed package as an alternative to dask.array for processing large datasets in Xarray.

Given as a lightning talk at the SciPy Conference 2023 in Austin, TX.

See this blog post for more details (https://xarray.dev/blog/cubed-xarray)

Tom Nicholas

July 17, 2023
Tweet

More Decks by Tom Nicholas

Other Decks in Programming

Transcript

  1. Cubed: Bounded-Memory
    Serverless Array Processing
    (in Xarray)
    *Tom Nicholas
    Tom White
    *[email protected]
    *github.com/TomNicholas

    View Slide

  2. Big science means *Big* arrays
    😍
    😬
    PBs??

    View Slide

  3. So use dask.array!
    Dask is great, but it doesn’t always succeed…
    Sometimes unexpectedly exceeds your
    RAM budget
    😕
    Q: Can we guarantee
    distributed array execution
    respects RAM constraints?

    View Slide

  4. Rechunker
    ✨Cubed✨
    (Bounded-memory)
    A: Yes! For certain operations…
    🤔

    View Slide

  5. Invented
    by
    Cubed’s Design

    View Slide

  6. Coiled Functions

    Serverless execution
    Deploy one serverless container per chunk - read from / write to Zarr

    View Slide

  7. Xarray wraps Cubed OR Dask OR [new things??]
    Executes via
    Executes via
    Cubed
    ??
    Tabular
    data:
    Array
    data:

    View Slide

  8. Read the blog post! https://xarray.dev/blog/cubed-xarray
    Also thanks Tom White
    for writing Cubed!
    Join the discussion!

    View Slide