Save 37% off PRO during our Black Friday Sale! »

Scaling Python Up and Out with Numba and Dask

Scaling Python Up and Out with Numba and Dask

An overview of Python for Data Science. In particular a description of how Numba can be used to speed up your Python code by compiling array-oriented code to native machine code. and how Dask can be used to run your code in parallel across multiple cores and multiple machines.

6c8561779fff34c62074c614d19980fc?s=128

Travis E. Oliphant

October 05, 2018
Tweet

Transcript

  1. © 2017 Continuum Analytics - Confidential & Proprietary © 2018

    Quansight - Confidential & Proprietary Scaling Python Up and Out with Numba and Dask Travis E. Oliphant PyCon India Tutorial October 5, 2018
  2. • MS/BS degrees in Elec. Comp. Engineering • PhD from

    Mayo Clinic in Biomedical Engineering (Ultrasound and MRI) • Creator and Developer of SciPy (1998-2009) • Professor at BYU (2001-2007) Inverse Problems • Creator and Developer of NumPy (2005-2012) • Started Numba and Conda (2012 - ) • Founder of NumFOCUS / PyData • Python Software Foundation Director (2012) • Co-founder of Continuum Analytics => Anaconda, Inc. • CEO (2012) => Chief Data Scientist (2017) • Founder (2018) of Quansight SciPy
  3. Company 2012 - Created Two Orgs for Sustainable Open Source

    Community Enterprise software company initially built on services and supporting open-source. Became
  4. Data Science Workflow New Data Notebooks Understand Data Getting Data

    Understand World Reports Microservices Dashboards Applications Decisions and Actions Models Exploratory Data Analysis and Viz Data Products
  5. Quansight — continuing Continuum momentum Replaced by Spin Out Spin

    Out 2012 2018 ? ? Key. Members of the management team at Continuum Analytics ==> Anaconda was our first (spin-out) company. 2015 2019 and beyond…
  6. What We Do Connecting companies and communities We build and

    connect companies and open-source communities to sustainably solve problems with data.
  7. © 2018 Quansight - Confidential & Proprietary 7 Core Business

    Quansight Labs Membership Staffing / Mentoring Custom Data-Science/ML Consulting Sustainable Open Source Partnerships
  8. Open Source Directions Webinar series to promote and encourage open-source

    Roadmaps. We also help communities publicize these roadmaps.
  9. LABS Sustaining the Future Open-source innovation and maintenance around the

    entire data- science and AI workflow. • NumPy ecosystem maintenance (fund developers) • Improve connection of NumPy to ML Frameworks • GPU Support for NumPy Ecosystem • Improve foundations of Array computing • JupyterLab • Data Catalog standards • Packaging (conda-forge, PyPA, etc.) uarray — unified array interface and symbolic NumPy xnd — re-factored NumPy (low-level cross-language libraries for N-D (tensor) computing) Partnered with NumFOCUS and Ursa Labs (supporting Arrow) Bokeh Adapted from Jake Vanderplas PyCon 2017 Keynote
  10. 1991 2018 2001 2015 2009 2012 2005 … 2001 2006

    Python Data Analysis and Machine Learning Time-Line 1991 2003 2014 2011 2010 2016
  11. Empower domain experts with high-level tools that exploit modern hard-ware

    Array Oriented Computing expertise
  12. • Express domain knowledge directly in arrays (tensors, matrices, vectors)

    --- easier to teach programming in domain • Can take advantage of parallelism and accelerators • Array expressions Why Array-oriented computing Object Attr1 Attr2 Attr3 Object Attr1 Attr2 Attr3 Object Attr1 Attr2 Attr3 Attr1 Attr2 Attr3 Object1 Object2 Object3 Object4 Object5 Object6 Object Attr1 Attr2 Attr3
  13. • Today’s vector machines (and vector co-processors, or GPUS) were

    made for array- oriented computing. • The software stack has just not caught up --- unfortunate because APL came out in 1963. • There is a reason Fortran remains popular among High Performance groups. Reasons for array-oriented
  14. Python and in particular PyData is Growing

  15. Bokeh Adapted from Jake Vanderplas PyCon 2017 Keynote

  16. Conda Conda Forge Conda Environments A cross-platform and language agnostic

    package and environment manager A community-led collection of recipes, build infrastructure, and packages for conda. Custom isolated software sandboxes to allow easy reproducibility and sharing of data-science work. Anaconda.org Web-site for freely hosting public packages and environments. Example of conda repository.
  17. • Language independent • Platform independent • No special privileges

    required • No VMs or containers • Enables: - Reproducibility - Collaboration - Scaling “conda – package everything” 17 A Python v2.7 Conda Sandboxing Technology B Python v3.4 Pandas v0.18 Jupyter C R R Essentials conda NumPy v1.11 NumPy v1.10 Pandas v0.16
  18. Basic Conda Usage 18 Install a package conda install sympy

    List all installed packages conda list Search for packages conda search llvm Create a new environment conda create -n py3k python=3 Remove a package conda remove nose Get help conda install --help
  19. Advanced Conda Usage 19 Install a package in an environment

    conda install -n py3k sympy Update all packages conda update --all Export list of packages conda list --export packages.txt Install packages from an export conda install --file packages.txt See package history conda list --revisions Revert to a revision conda install --revision 23 Remove unused packages and cached tarballs conda clean -pt
  20. 20 Development Deployment Conda eases rapid deployment

  21. NumPy

  22. Without NumPy from math import sin, pi def sinc(x): if

    x == 0: return 1.0 else: pix = pi*x return sin(pix)/pix def step(x): if x > 0: return 1.0 elif x < 0: return 0.0 else: return 0.5 functions.py >>> import functions as f >>> xval = [x/3.0 for x in range(-10,10)] >>> yval1 = [f.sinc(x) for x in xval] >>> yval2 = [f.step(x) for x in xval] Python is a great language but needed a way to operate quickly and cleanly over multi- dimensional arrays.
  23. With NumPy from numpy import sin, pi from numpy import

    vectorize import functions as f vsinc = vectorize(f.sinc) def sinc(x): pix = pi*x val = sin(pix)/pix val[x==0] = 1.0 return val vstep = vectorize(f.step) def step(x): y = x*0.0 y[x>0] = 1 y[x==0] = 0.5 return y >>> import functions2 as f >>> from numpy import * >>> x = r_[-10:10]/3.0 >>> y1 = f.sinc(x) >>> y2 = f.step(x) functions2.py Offers N-D array, element-by-element functions, and basic random numbers, linear algebra, and FFT capability for Python http://numpy.org Fiscally sponsored by NumFOCUS
  24. NumPy: an Array Extension of Python • Data: the array

    object – slicing and shaping – data-type map to Bytes • Fast Math (ufuncs): – vectorization – broadcasting – aggregations
  25. shape NumPy Array Key Attributes • dtype • shape •

    ndim • strides • data
  26. NumPy Examples 2d array 3d array [439 472 477] [217

    205 261 222 245 238] 9.98330639789 2.96677717122
  27. NumPy Slicing (Selection) >>> a[0,3:5] array([3, 4]) >>> a[4:,4:] array([[44,

    45], [54, 55]]) >>> a[:,2] array([2,12,22,32,42,52]) >>> a[2::2,::2] array([[20, 22, 24], [40, 42, 44]])
  28. Summary • Provides foundational N-dimensional array composed of homogeneous elements

    of a particular “dtype” • The dtype of the elements is extensive (but difficult to extend) • Arrays can be sliced and diced with simple syntax to provide easy manipulation and selection. • Provides fast and powerful math, statistics, and linear algebra functions that operate over arrays. • Utilities for sorting, reading and writing data also provided.
  29. Scaling Up and Out with Numba and Dask

  30. Scale Up vs Scale Out Big Memory & Many Cores

    / GPU Box Best of Both (e.g. GPU Cluster) Many commodity nodes in a cluster Scale Up (Bigger Nodes) Scale Out (More Nodes) Numba Dask Dask with Numba
  31. Development Name Latest Release Number of Releases GitHub Stars Contributors

    numba 0.40.0 113 3476 96 dask 0.19.2 52 3507 195 dask-ml 0.10.0 15 104 23 numpy 1.15.2 144 8298 694 pandas 0.23.4 97 16,276 1285 Numba Dask Dask-ml http://numba.pydata.org http://github.com/numba http://dask.pydata.org http://github.com/dask http://dask-ml.readthedocs.io/en/latest/index.html http://github.com/dask/dask-ml
  32. Numba

  33. • Python is one of the most popular languages for

    data science • Python integrates well with compiled, accelerated libraries (MKL, TensorFlow, etc) • But what about custom algorithms and data processing tasks? • Our goal was to make a compiler that: • Worked within the standard Python interpreter, not replaced it • Integrated tightly with NumPy • Compatible with both multithreaded and distributed computing paradigms A Compiler for Python? Combining Productivity and Performance
  34. • An open-source, function-at-a-time compiler library for Python • Compiler

    toolbox for different targets and execution models: • single-threaded CPU, multi-threaded CPU, GPU • regular functions, “universal functions” (array functions), etc • Speedup: 2x (compared to basic NumPy code) to 200x (compared to pure Python) • Combine ease of writing Python with speeds approaching FORTRAN • Empowers data scientists who make tools for themselves and other data scientists Numba: A JIT Compiler for Python
  35. 7 things about Numba you may not know 1 2

    3 4 5 6 7 Numba is 100% Open Source Numba + Jupyter = Rapid CUDA Prototyping Numba can compile for the CPU and the GPU at the same time Numba makes array processing easy with @(gu)vectorize Numba comes with a CUDA Simulator You can send Numba functions over the network Numba developers contributing to a GPU DataFrame (pygdf)
  36. Numba (compile Python to CPUs and GPUs) conda install numba

    Intermediate Representation (IR) x86 ARM PTX Python LLVM Numba Code Generation Backend Parsing Frontend
  37. How does Numba work? Python Function (bytecode) Bytecode Analysis Functions

    Arguments Numba IR Machine Code Execute! Type Inference LLVM/NVVM JIT LLVM IR Lowering Rewrite IR Cache @jit def do_math(a, b): … >>> do_math(x, y)
  38. Supported Platforms and Hardware OS HW SW Windows
 (7 and

    later) 32 and 64-bit CPUs (Incl Xeon Phi) Python 2.7, 3.4-3.7 OS X
 (10.9 and later) CUDA & HSA GPUs NumPy 1.10 and later Linux
 (RHEL 6 and later) Some support for ARM and ROCm
  39. Basic Example

  40. Basic Example Array Allocation Looping over ndarray x as an

    iterator Using numpy math functions Returning a slice of the array 2.7x speedup! Numba decorator
 (nopython=True not required)
  41. • Detects CPU model during compilation and optimizes for that

    target • Automatic type inference: No need to give type signatures for functions • Dispatches to multiple type-specializations for the same function • Call out to C libraries with CFFI and types • Special "callback" mode for creating C callbacks to use with external libraries • Optional caching to disk, and ahead-of-time creation of shared libraries • Compiler is extensible with new data types and functions Numba Features
  42. • Three main technologies for parallelism: Parallel Computing SIMD Multi-threading

    Distributed Computing x0 x1 x2 x3 x0 x1 x2 x3 x0 x3 x2 x1
  43. • Numba's CPU detection will enable LLVM to autovectorize for

    appropriate SIMD instruction set: • SSE, AVX, AVX2, AVX-512 • Will become even more important as AVX-512 is now available on both Xeon Phi and Skylake Xeon processors SIMD: Single Instruction Multiple Data
  44. Manual Multithreading: Release the GIL Speedup Ratio 0 0.9 1.8

    2.6 3.5 Number of Threads 1 2 4 Option to release the GIL Using Python concurrent.futures
  45. Universal Functions (Ufuncs) Ufuncs are a core concept in NumPy

    for array-oriented computing. ◦ A function with scalar inputs is broadcast across the elements of the input arrays: • np.add([1,2,3], 3) == [4, 5, 6] • np.add([1,2,3], [10, 20, 30]) == [11, 22, 33] ◦ Parallelism is present, by construction. Numba will generate loops and can automatically multi-thread if requested. ◦ Before Numba, creating fast ufuncs required writing C. No longer!
  46. Universal Functions (Ufuncs) Different decorator! 1.8x speedup!

  47. Multi-threaded Ufuncs Specify type signature Select parallel target Automatically uses

    all CPU cores!
  48. ParallelAccelerator • ParallelAccelerator is a special compiler pass contributed by

    Intel Labs • Todd A. Anderson, Ehsan Totoni, Paul Liu • Based on similar contribution to Julia • Automatically generates mulithreaded code in a Numba compiled- function: • Array expressions and reductions • Random functions • Dot products • Explicit loops indicated with prange() call
  49. ParallelAccelerator: Example #1 Time (ms) 0 1000 2000 3000 4000

    NumPy Numba Numba+PA 1.8x 3.6x 1000000x10 input, Core i7 Quad Core CPU
  50. ParallelAccelerator: prange() Time (ms) 0 25 50 75 100 NumPy

    Numba Numba+PA 4.3x 50x 1000000x10 input, Core i7 Quad Core CPU
  51. ParallelAccelerator: prange() Time (ms) 0 25 50 75 100 NumPy

    Numba Numba+PA 2x 3.6x 1000000x10 input, Core i7 Quad Core CPU
  52. ParallelAccelerator: Image Resampling https://github.com/bokeh/ datashader/blob/master/examples/ landsat.ipynb Interactive image resampling with

    Holoviews + Datashader Datashader resampling implemented with Numba + prange()
  53. ParallelAccelerator: Stencils 730x547 image w/ 21x21 pixel blur half the

    lines of code and 4x faster on a quad core CPU than equivalent non-stencil Numba code
  54. Distributed Computing
 Example: Dask Dask Client
 (Haswell) Dask Scheduler Dask

    Worker
 (Skylake) Dask Worker (Skylake) Dask Worker (Knight’s Landing) @jit def f(x): … - Serialize with pickle module - Works with Dask and Spark (and others) - Automatic recompilation for each target f(x) f(x) f(x)
  55. Other Numba topics CUDA Python — write general GPU kernels

    with Python Device Arrays — manage memory transfer from host to GPU Streaming — manage asynchronous and parallel GPU compute streams CUDA Simulator in Python — to help debug your kernels HSA Support — early support for HSA-based GPUs and APUs Pyculib — access to cuFFT, cuBLAS, cuSPARSE, cuRAND, CUDA Sorting https://github.com/ContinuumIO/gtc2017-numba
  56. Dask

  57. • Designed to parallelize the Python ecosystem • Handles complex

    algorithms • Co-developed with Pandas/SKLearn/Jupyter teams • Familiar APIs for Python users • Scales • Scales from multicore to 1000-node clusters • Resilience, responsive, and real-time
  58. • Parallelizes NumPy, Pandas, SKLearn • Satisfies subset of these

    APIs • Uses these libraries internally • Co-developed with these teams • Task scheduler supports custom algorithms • Parallelize existing code • Build novel real-time systems • Arbitrary task graphs 
 with data dependencies • Same scalability
  59. demo video • High level: Scaling Pandas • Same Pandas

    look and feel • Uses Pandas under the hood • Scales nicely onto many machines • Low level: Arbitrary task scheduling • Parallelize normal Python code • Build custom algorithms • React real-time • Demo deployed with • dask-kubernetes 
 Google Compute Engine • github.com/dask/dask-kubernetes • Youtube link • https://www.youtube.com/watch? v=ods97a5Pzw0&
  60. Why do people choose Dask? • Familiar with Python: •

    Drop-in NumPy/Pandas/SKLearn APIs • Native memory environment • Easy debugging and diagnostics • Have complex problems: • Parallelize existing code without expensive rewrites • Sophisticated algorithms and systems • Real-time response to small-data • Scales up and down: • Scales to 1000-node clusters • Also runs cheaply on a laptop #import pandas as pd import dask.dataframe as dd
  61. Dask • Started as part of Blaze in early 2014.

    • General parallel programming engine • Flexible and therefore highly suited for • Commodity Clusters • Advanced Algorithms • Wide community adoption and use conda install -c conda-forge dask pip install dask[complete] distributed --upgrade
  62. 62 Big Data Small Data Numba

  63. Dask: From User Interaction to Execution 63 delayed

  64. Dask: Parallel Data Processing Synthetic views of Numpy ndarrays Synthetic

    views of Pandas DataFrames with HDFS support DAG construction and workflow manager
  65. 65 >>> import pandas as pd >>> df = pd.read_csv('iris.csv')

    >>> df.head() sepal_length sepal_width petal_length petal_width species 0 5.1 3.5 1.4 0.2 Iris-setosa 1 4.9 3.0 1.4 0.2 Iris-setosa 2 4.7 3.2 1.3 0.2 Iris-setosa 3 4.6 3.1 1.5 0.2 Iris-setosa 4 5.0 3.6 1.4 0.2 Iris-setosa >>> max_sepal_length_setosa = df[df.species == 'setosa'].sepal_length.max() 5.7999999999999998 >>> import dask.dataframe as dd >>> ddf = dd.read_csv('*.csv') >>> ddf.head() sepal_length sepal_width petal_length petal_width species 0 5.1 3.5 1.4 0.2 Iris-setosa 1 4.9 3.0 1.4 0.2 Iris-setosa 2 4.7 3.2 1.3 0.2 Iris-setosa 3 4.6 3.1 1.5 0.2 Iris-setosa 4 5.0 3.6 1.4 0.2 Iris-setosa … >>> d_max_sepal_length_setosa = ddf[ddf.species == 'setosa'].sepal_length.max() >>> d_max_sepal_length_setosa.compute() 5.7999999999999998 Dask DataFrame is like Pandas
  66. New Spark/Hadoop clusters • Create and provision a Spark/Hadoop cluster

    with a few simple steps • Work on the cloud or with your existing in-house servers Dask Graphs: Example Machine Learning Pipeline 66
  67. Example 1: Using Dask DataFrames on a cluster with CSV

    data 67 • Built from Pandas DataFrames • Match Pandas interface • Access data from HDFS, S3, local, etc. • Fast, low latency • Responsive user interface
  68. 68 >>> import numpy as np >>> np_ones = np.ones((5000,

    1000)) >>> np_ones array([[ 1., 1., 1., ..., 1., 1., 1.], [ 1., 1., 1., ..., 1., 1., 1.], [ 1., 1., 1., ..., 1., 1., 1.], ..., [ 1., 1., 1., ..., 1., 1., 1.], [ 1., 1., 1., ..., 1., 1., 1.], [ 1., 1., 1., ..., 1., 1., 1.]]) >>> np_y = np.log(np_ones + 1)[:5].sum(axis=1) >>> np_y array([ 693.14718056, 693.14718056, 693.14718056, 693.14718056, 693.14718056]) >>> import dask.array as da >>> da_ones = da.ones((5000000, 1000000), chunks=(1000, 1000)) >>> da_ones.compute() array([[ 1., 1., 1., ..., 1., 1., 1.], [ 1., 1., 1., ..., 1., 1., 1.], [ 1., 1., 1., ..., 1., 1., 1.], ..., [ 1., 1., 1., ..., 1., 1., 1.], [ 1., 1., 1., ..., 1., 1., 1.], [ 1., 1., 1., ..., 1., 1., 1.]]) >>> da_y = da.log(da_ones + 1)[:5].sum(axis=1) >>> np_da_y = np.array(da_y) #fits in memory array([ 693.14718056, 693.14718056, 693.14718056, 693.14718056, …, 693.14718056]) # If result doesn’t fit in memory >>> da_y.to_hdf5('myfile.hdf5', 'result') Dask Array is like NumPy
  69. Example 3: Using Dask Arrays with global temperature data 69

    • Built from NumPy
 n-dimensional arrays • Matches NumPy interface (subset) • Solve medium-large problems • Complex algorithms
  70. Dask Schedulers: Distributed Scheduler 70

  71. • Scheduling arbitrary graphs is hard. • Optimal graph scheduling

    is NP-hard • Scalable Scheduling requires Linear time solutions • Fortunately dask does well with a lot of heuristics • … and a lot of monitoring and data about sizes • … and how long functions take. Dask Scheduler 71
  72. Cluster Architecture Diagram 72 Client Machine Compute Node Compute Node

    Compute Node Head Node
  73. • Single machine with multiple threads or processes • On

    a cluster with SSH (dcluster) • Resource management: YARN (knit), SGE, Slurm • On the cloud with Amazon EC2 (dec2) or Google CE • On a cluster with Anaconda for cluster management • Manage multiple conda environments and packages 
 on bare-metal or cloud-based clusters Using Anaconda and Dask on your Cluster 73
  74. Scheduler Visualization with Bokeh 74

  75. What makes Dask different? Lets look at some pictures of

    directed graphs
  76. None
  77. None
  78. Most Parallel Framework Architectures User API High Level Representation Logical

    Plan Low Level Representation Physical Plan Task scheduler for execution
  79. SQL Database Architecture SELECT avg(value) FROM accounts INNER JOIN customers

    ON … WHERE name == ‘Alice’
  80. SQL Database Architecture SELECT avg(value) FROM accounts WHERE name ==

    ‘Alice’ INNER JOIN customers ON … Optimize
  81. Spark Architecture df.join(df2, …) .select(…) .filter(…) Optimize

  82. Large Matrix Architecture (A’ * A) \ A’ * b

    Optimize
  83. Dask Architecture

  84. Dask Architecture accts=dd.read_parquet(…) accts=accts[accts.name == ‘Alice’] df=dd.merge(accts, customers) df.value.mean().compute()

  85. Dask Architecture u, s, v = da.linalg.svd(X) Y = u.dot(da.diag(s)).dot(v.T)

    da.linalg.norm(X - y)
  86. Dask Architecture for i in range(256): x = dask.delayed(f)(i) y

    = dask.delayed(g)(x) z = dask.delayed(add)(x, y
  87. Dask Architecture async def func(): client = await Client() futures

    = client.map(…) async for f in as_completed(…): result = await f
  88. Dask Architecture Your own system here

  89. By dropping the high level representation Costs • Lose specialization

    • Lose opportunities for high level optimization Benefits • Become generalists • More flexibility for new domains and algorithms • Access to smarter algorithms • Better task scheduling
 Resource constraints, GPUs, multiple clients,
 async-real-time, etc..
  90. Ten Reasons People Choose Dask

  91. Scalable Pandas DataFrames • Same API
 import dask.dataframe as dd


    df = dd.read_parquet(‘s3://bucket/accounts/2017')
 df.groupby(df.name).value.mean().compute() • Efficient Timeseries Operations
 df.loc[‘2017-01-01’] # Uses the Pandas index…
 df.value.rolling(10).std() # for efficient…
 df.value.resample(‘10m’).mean() # operations. • Co-developed with Pandas
 and by the Pandas developer community
  92. Scalable NumPy Arrays • Same API
 
 import dask.array as

    da
 x = da.from_array(my_hdf5_file)
 y = x.dot(x.T) • Applications • Atmospheric science • Satellite imagery • Biomedical imagery • Optimization algorithms
 check out dask-glm
  93. Parallelize Scikit-Learn/Joblib • Scikit-Learn parallelizes with Joblib
 
 estimator =

    RandomForest(…)
 
 estimator.fit(train_data, train_labels, njobs=8) • Joblib can use Dask
 
 from sklearn.externals.joblib import parallel_backend
 with parallel_backend('dask', scheduler=‘…’): estimator.fit(train_data, train_labels) https://pythonhosted.org/joblib/ http://distributed.readthedocs.io/en/latest/joblib.html Joblib Thread pool
  94. Parallelize Scikit-Learn/Joblib • Scikit-Learn parallelizes with Joblib
 
 estimator =

    RandomForest(…)
 
 estimator.fit(train_data, train_labels, njobs=8) • Joblib can use Dask
 
 from sklearn.externals.joblib import parallel_backend
 with parallel_backend('dask', scheduler=‘…’): estimator.fit(train_data, train_labels) https://pythonhosted.org/joblib/ http://distributed.readthedocs.io/en/latest/joblib.html Joblib Dask
  95. Many Other Libraries in Anaconda • Scikit-Image uses dask to

    break down images and speed up algorithms with overlapping regions • Geopandas can use Dask to partition data spatially and accelerate spatial joins
  96. Dask Scales Up • Thousand node clusters • Cloud computing

    • Super computers • Gigabyte/s bandwidth • 200 microsecond task overhead Dask Scales Down (the median cluster size is one) • Can run in a single Python thread pool • Almost no performance penalty (microseconds) • Lightweight • Few dependencies • Easy install
  97. Parallelize Web Backends • Web servers process thousands of small

    computations asynchronously
 for web pages or REST endpoints • Dask provides dynamic, heterogenous computation • Supports small data • 10ms roundtrip times • Dynamic scaling for different loads • Supports asynchronous Python (like GoLang)
 
 async def serve(request):
 future = dask_client.submit(process, request)
 result = await future
 return result
  98. Debugging support • Clean Python tracebacks when user code breaks

    • Connect to remote workers with IPython sessions 
 for advanced debugging
  99. Resource constraints • Define limited hardware resources for workers •

    Specify resource constraints when submitting tasks $ dask-worker … —resources GPU=2 $ dask-worker … —resources GPU=2 $ dask-worker … —resources special-db=1 future = client.submit(my_function, resources={‘GPU’: 1}) • Used for GPUs, big-memory machines, special hardware, database connections, I/O machines, etc..
  100. Collaboration • Many users can share the same cluster simultaneously

    • Define public datasets • Repeated computation and data use is shared among everyone df = dd.read_parquet(…).persist() client.publish_dataset(accounts=df) df = client.get_dataset(‘accounts’)
  101. Beautiful Diagnostic Dashboards • Fast responsive dashboards • Provide users

    performance insight • Powered by Bokeh
  102. Some Reasons not to Choose Dask

  103. • Dask is not a SQL database. 
 Does Pandas

    well, but won’t optimize complex queries. • Dask is not MPI
 Very fast, but does leave some performance on the table
 200us task overhead
 a couple copies in the network stack • Dask is not a JVM technology
 It’s a Python library
 (although Julia bindings available) • Dask is not always necessary 
 You may not need parallelism Dask’s limitations
  104. dask.pydata.org conda install dask