Dask on HPC in 2024 - Lightning Talk

1 Dask on HPC in 2024 Jacob Tomlinson, Dask Maintainer
and RAPIDS Developer Pangeo CNES 2024

2 Dask Scales Python Lazy computation, out of core, distributed
execution Numpy Pandas Scikit-Learn

3 Graph Generation Dask converts code to a graph and
executes it

4 Dask Distributed Dask cluster process architecture Client Your Dask
code that run the business logic of your workflow. Converts code to task graphs instead of executing directly. Scheduler Receives task graphs and coordinates the execution of those tasks. Also makes autoscaling decisions. Workers Execute individual tasks on remote machines

5 Clusters vs Runners Deployment Paradigms Batch Runner Dynamic Cluster
Workload starts as a multi-node job Nodes coordinate at startup to elect a scheduler and run client code Workload starts as a single node job Dask spawns multiple single-node worker jobs dynamically as they are required

6 dask-jobqueue Directly interact with queues • Has tools for
dynamic clusters and runners • Supports many schedulers including PBS, SLURM, SGE, OAR and more • Integrates well with other Dask tooling like Dask’s Jupyter Lab extension

7 Cluster Example Interactive dynamic scaling from dask.distributed import Client
from dask_jobqueue.slurm import SLURMCluster cluster = SLURMCluster(cores=1, memory="4GB") cluster.scale(2) client = Client(cluster) ... client.close() cluster.close()

8 Runner Example Batch workloads from dask.distributed import Client from
dask_jobqueue.slurm import SLURMRunner with SLURMRunner(scheduler_file="scheduler-{job_id}.json") as runner: with Client(runner) as client: client.wait_for_workers(runner.n_workers) ... $ srun -n 100 python runner.py

9 dask-mpi Batch workloads on any MPI system from dask_mpi
import initialize initialize() from dask.distributed import Client client = Client() $ srun -n 100 python mpi-runner.py $ mpirun -np 4 dask-mpi \ --scheduler-file scheduler.json from distributed import Client client = Client(scheduler_file='scheduler.json')

10 dask-gateway Centrally managed cluster spawning IN MAINTENANCE MODE If
you rely on Dask Gateway, we need contributors!

11 Roadmap Longer term plans for Dask on HPC •
Add more Runners to dask-jobqueue for other schedulers • Migrate dask-mpi into dask-jobqueue as a Runner • Improve dask-cuda compatibility in dask-jobqueue • Build out more Dask on HPC documentation and resources

Dask on HPC in 2024 - Lightning Talk

Dask on HPC in 2024 - Lightning Talk

Jacob Tomlinson

More Decks by Jacob Tomlinson

Other Decks in Technology

Featured

Transcript

1 Dask on HPC in 2024 Jacob Tomlinson, Dask Maintainer

2 Dask Scales Python Lazy computation, out of core, distributed

3 Graph Generation Dask converts code to a graph and

4 Dask Distributed Dask cluster process architecture Client Your Dask

5 Clusters vs Runners Deployment Paradigms Batch Runner Dynamic Cluster

6 dask-jobqueue Directly interact with queues • Has tools for

7 Cluster Example Interactive dynamic scaling from dask.distributed import Client

8 Runner Example Batch workloads from dask.distributed import Client from

9 dask-mpi Batch workloads on any MPI system from dask_mpi

10 dask-gateway Centrally managed cluster spawning IN MAINTENANCE MODE If

11 Roadmap Longer term plans for Dask on HPC •