code that run the business logic of your workflow. Converts code to task graphs instead of executing directly. Scheduler Receives task graphs and coordinates the execution of those tasks. Also makes autoscaling decisions. Workers Execute individual tasks on remote machines
Workload starts as a multi-node job Nodes coordinate at startup to elect a scheduler and run client code Workload starts as a single node job Dask spawns multiple single-node worker jobs dynamically as they are required
dynamic clusters and runners • Supports many schedulers including PBS, SLURM, SGE, OAR and more • Integrates well with other Dask tooling like Dask’s Jupyter Lab extension
dask_jobqueue.slurm import SLURMRunner with SLURMRunner(scheduler_file="scheduler-{job_id}.json") as runner: with Client(runner) as client: client.wait_for_workers(runner.n_workers) ... $ srun -n 100 python runner.py
Add more Runners to dask-jobqueue for other schedulers • Migrate dask-mpi into dask-jobqueue as a Runner • Improve dask-cuda compatibility in dask-jobqueue • Build out more Dask on HPC documentation and resources