Upgrade to Pro — share decks privately, control downloads, hide ads and more …

GPU accelerating your computation in Python

GPU accelerating your computation in Python

There are many powerful libraries in the Python ecosystem for accelerating the computation of large arrays with GPUs. We have CuPy for GPU array computation, Dask for distributed computation, cuML for machine learning, Pytorch for deep learning and more. We will dig into how these libraries can be used together to accelerate geoscience workflows and how we are working with projects like Xarray to integrate these libraries with domain-specific tooling. Sgkit is already providing this for the field of genetics and we are excited to be working with community groups like Pangeo to bring this kind of tooling to the geosciences.

How to cite: Tomlinson, J.: Distributing your GPU array computation in Python, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-7610, https://doi.org/10.5194/egusphere-egu22-7610, 2022.

Ca3d0556227d66b3c15be1eadf69473b?s=128

Jacob Tomlinson

May 25, 2022
Tweet

More Decks by Jacob Tomlinson

Other Decks in Science

Transcript

  1. Jacob Tomlinson Senior Software Engineer, RAPIDS Dask Core Maintainer GPU

    accelerating your computation in Python EGU General Assembly 2022 EGU22-7610, https://doi.org/10.5194/egusphere-egu22-7610, 2022.
  2. 2 RAPIDS https://github.com/rapidsai

  3. 3 Jake VanderPlas - PyCon 2017

  4. 4 Pandas Analytics CPU Memory Data Preparation Visualization Model Training

    Scikit-Learn Machine Learning NetworkX Graph Analytics PyTorch, TensorFlow, MxNet Deep Learning Matplotlib Visualization Dask Open Source Data Science Ecosystem Familiar Python APIs
  5. 5 cuDF cuIO Analytics GPU Memory Data Preparation Visualization Model

    Training cuML Machine Learning cuGraph Graph Analytics PyTorch, TensorFlow, MxNet Deep Learning cuxfilter, pyViz, plotly Visualization Dask RAPIDS End-to-End Accelerated GPU Data Science
  6. 6 RAPIDS Matches Common Python APIs CPU-based Clustering from sklearn.datasets

    import make_moons import pandas X, y = make_moons(n_samples=int(1e2), noise=0.05, random_state=0) X = pandas.DataFrame({'fea%d'%i: X[:, i] for i in range(X.shape[1])}) from sklearn.cluster import DBSCAN dbscan = DBSCAN(eps = 0.3, min_samples = 5) y_hat = dbscan.fit_predict(X)
  7. 7 from sklearn.datasets import make_moons import cudf X, y =

    make_moons(n_samples=int(1e2), noise=0.05, random_state=0) X = cudf.DataFrame({'fea%d'%i: X[:, i] for i in range(X.shape[1])}) from cuml import DBSCAN dbscan = DBSCAN(eps = 0.3, min_samples = 5) y_hat = dbscan.fit_predict(X) RAPIDS Matches Common Python APIs GPU-accelerated Clustering
  8. 8 Benchmarks: Single-GPU cuML vs Scikit-learn 1x V100 vs. 2x

    20 Core CPUs (DGX-1, RAPIDS 0.15)
  9. 9 Exactly as it sounds—our goal is to make RAPIDS

    as usable and performant as possible wherever science is done. We will continue to work with more open source projects to further democratize acceleration and efficiency in science. RAPIDS Everywhere The Next Phase of RAPIDS
  10. 10 Statistical genetics toolkit in Python

  11. 11 Integrations, feedback, documentation support, pull requests, new issues, or

    code donations welcomed! APACHE ARROW GPU OPEN ANALYTICS INITIATIVE https://arrow.apache.org/ @ApacheArrow http://gpuopenanalytics.com/ @GPUOAI RAPIDS https://rapids.ai @RAPIDSai DASK https://dask.org @Dask_dev Work with us Everyone Can Help!
  12. THANK YOU Jacob Tomlinson jtomlinson@nvidia.com @_jacobtomlinson