Upgrade to Pro — share decks privately, control downloads, hide ads and more …

EuroSciPy17 - Distributed compute with Dask, Kubernetes and AWS

EuroSciPy17 - Distributed compute with Dask, Kubernetes and AWS

Video - https://www.youtube.com/watch?v=6nFllDegCTY

Dask is a Python based flexible parallel computing library. It enables interactive data analysis on large datasets, and scales from your laptop to a cluster. This parallelization can speed up your analysis, but if your compute nodes are sat idle you can end up burning a lot of money.

Dask has an interface through which it can ask for more / less resources. We've built an example system, using cluster management tools like Kubernetes and public cloud infrastructure, that allows you to maximize the amount of compute you get when you need it while also minimizing cost.

Jacob Tomlinson

August 31, 2017
Tweet

More Decks by Jacob Tomlinson

Other Decks in Technology

Transcript

  1. Who