Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Dask on HPC in 2024

Dask on HPC in 2024

Jacob Tomlinson

November 12, 2024
Tweet

More Decks by Jacob Tomlinson

Other Decks in Technology

Transcript

  1. 1 Dask on HPC in 2024 Jacob Tomlinson, Dask Maintainer

    and RAPIDS Developer Pangeo CNES 2024
  2. 4 Dask Distributed Dask cluster process architecture Client Your Dask

    code that run the business logic of your workflow. Converts code to task graphs instead of executing directly. Scheduler Receives task graphs and coordinates the execution of those tasks. Also makes autoscaling decisions. Workers Execute individual tasks on remote machines
  3. 5 Clusters vs Runners Deployment Paradigms Batch Runner Dynamic Cluster

    Workload starts as a multi-node job Nodes coordinate at startup to elect a scheduler and run client code Workload starts as a single node job Dask spawns multiple single-node worker jobs dynamically as they are required
  4. 6 dask-jobqueue Directly interact with queues • Has tools for

    dynamic clusters and runners • Supports many schedulers including PBS, SLURM, SGE, OAR and more • Integrates well with other Dask tooling like Dask’s Jupyter Lab extension
  5. 7 Cluster Example Interactive dynamic scaling from dask.distributed import Client

    from dask_jobqueue.slurm import SLURMCluster cluster = SLURMCluster(cores=1, memory="4GB") cluster.scale(2) client = Client(cluster) ... client.close() cluster.close()
  6. 8 Runner Example Batch workloads from dask.distributed import Client from

    dask_jobqueue.slurm import SLURMRunner with SLURMRunner(scheduler_file="scheduler-{job_id}.json") as runner: with Client(runner) as client: client.wait_for_workers(runner.n_workers) ... $ srun -n 100 python runner.py
  7. 9 dask-mpi Batch workloads on any MPI system from dask_mpi

    import initialize initialize() from dask.distributed import Client client = Client() $ srun -n 100 python mpi-runner.py $ mpirun -np 4 dask-mpi \ --scheduler-file scheduler.json from distributed import Client client = Client(scheduler_file='scheduler.json')
  8. 10 dask-gateway Centrally managed cluster spawning IN MAINTENANCE MODE If

    you rely on Dask Gateway, we need contributors!
  9. 11 Roadmap Longer term plans for Dask on HPC •

    Add more Runners to dask-jobqueue for other schedulers • Migrate dask-mpi into dask-jobqueue as a Runner • Improve dask-cuda compatibility in dask-jobqueue • Build out more Dask on HPC documentation and resources