$30 off During Our Annual Pro Sale. View Details »

GPU accelerating your computation in Python

GPU accelerating your computation in Python

There are many powerful libraries in the Python ecosystem for accelerating the computation of large arrays with GPUs. We have CuPy for GPU array computation, Dask for distributed computation, cuML for machine learning, Pytorch for deep learning and more. We will dig into how these libraries can be used together to accelerate geoscience workflows and how we are working with projects like Xarray to integrate these libraries with domain-specific tooling. Sgkit is already providing this for the field of genetics and we are excited to be working with community groups like Pangeo to bring this kind of tooling to the geosciences.

How to cite: Tomlinson, J.: Distributing your GPU array computation in Python, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-7610, https://doi.org/10.5194/egusphere-egu22-7610, 2022.

Jacob Tomlinson

May 25, 2022
Tweet

More Decks by Jacob Tomlinson

Other Decks in Science

Transcript

  1. Jacob Tomlinson
    Senior Software Engineer, RAPIDS
    Dask Core Maintainer
    GPU accelerating your computation in Python
    EGU General Assembly 2022
    EGU22-7610, https://doi.org/10.5194/egusphere-egu22-7610, 2022.

    View Slide

  2. 2
    RAPIDS
    https://github.com/rapidsai

    View Slide

  3. 3
    Jake VanderPlas - PyCon 2017

    View Slide

  4. 4
    Pandas
    Analytics
    CPU Memory
    Data Preparation Visualization
    Model Training
    Scikit-Learn
    Machine Learning
    NetworkX
    Graph Analytics
    PyTorch,
    TensorFlow, MxNet
    Deep Learning
    Matplotlib
    Visualization
    Dask
    Open Source Data Science Ecosystem
    Familiar Python APIs

    View Slide

  5. 5
    cuDF cuIO
    Analytics
    GPU Memory
    Data Preparation Visualization
    Model Training
    cuML
    Machine Learning
    cuGraph
    Graph Analytics
    PyTorch,
    TensorFlow, MxNet
    Deep Learning
    cuxfilter, pyViz,
    plotly
    Visualization
    Dask
    RAPIDS
    End-to-End Accelerated GPU Data Science

    View Slide

  6. 6
    RAPIDS Matches Common Python APIs
    CPU-based Clustering
    from sklearn.datasets import make_moons
    import pandas
    X, y = make_moons(n_samples=int(1e2),
    noise=0.05, random_state=0)
    X = pandas.DataFrame({'fea%d'%i: X[:, i]
    for i in range(X.shape[1])})
    from sklearn.cluster import DBSCAN
    dbscan = DBSCAN(eps = 0.3, min_samples = 5)
    y_hat = dbscan.fit_predict(X)

    View Slide

  7. 7
    from sklearn.datasets import make_moons
    import cudf
    X, y = make_moons(n_samples=int(1e2),
    noise=0.05, random_state=0)
    X = cudf.DataFrame({'fea%d'%i: X[:, i]
    for i in range(X.shape[1])})
    from cuml import DBSCAN
    dbscan = DBSCAN(eps = 0.3, min_samples = 5)
    y_hat = dbscan.fit_predict(X)
    RAPIDS Matches Common Python APIs
    GPU-accelerated Clustering

    View Slide

  8. 8
    Benchmarks: Single-GPU cuML vs Scikit-learn
    1x V100 vs. 2x 20 Core CPUs (DGX-1, RAPIDS 0.15)

    View Slide

  9. 9
    Exactly as it sounds—our goal is to make
    RAPIDS as usable and performant as
    possible wherever science is done. We
    will continue to work with more open
    source projects to further democratize
    acceleration and efficiency in science.
    RAPIDS Everywhere
    The Next Phase of RAPIDS

    View Slide

  10. 10
    Statistical genetics toolkit in Python

    View Slide

  11. 11
    Integrations, feedback, documentation support, pull requests, new issues, or code donations welcomed!
    APACHE ARROW
    GPU OPEN ANALYTICS
    INITIATIVE
    https://arrow.apache.org/
    @ApacheArrow
    http://gpuopenanalytics.com/
    @GPUOAI
    RAPIDS
    https://rapids.ai
    @RAPIDSai
    DASK
    https://dask.org
    @Dask_dev
    Work with us
    Everyone Can Help!

    View Slide

  12. THANK YOU
    Jacob Tomlinson
    [email protected]
    @_jacobtomlinson

    View Slide