Upgrade to Pro — share decks privately, control downloads, hide ads and more …

End-to-End Open Source Data Science Workflow U...

OmniSci
April 23, 2019

End-to-End Open Source Data Science Workflow Using OmniSci and Nvidia RAPIDS

At Global Artificial Intelligence Conference in San Diego on April 23rd 2019.

OmniSci

April 23, 2019
Tweet

More Decks by OmniSci

Other Decks in Technology

Transcript

  1. End-to-End Open Source Data Science Workflow Using OmniSci and Nvidia

    RAPIDS Global Artificial Intelligence Conference, San Diego Veda Shankar, OmniSci | April 23rd, 2019
  2. © OmniSci 2018 Data Grows Faster Than CPU Processing Data

    Growth 40% per year CPU Processing Power 20% per year
  3. © OmniSci 2018 9 OmniSci Innovations Powering Extreme Analytics 3-Tier

    Memory Caching Query Compilation In-Situ Rendering
  4. © OmniSci 2018 Three Ways to Get Started GitHub repo

    OPEN SOURCE OmniSci as a service OMNISCI CLOUD Contact sales ENTERPRISE 12
  5. © OmniSci 2018 13 pymapd • The pymapd client interface

    provides a python DB API 2.0-compliant OmniSci interface. • pymapd provides methods to get results in the Apache Arrow-based GDF format for efficient data interchange with ML Libraries (XGBoost, H2O) • Reference blogs ◦ Using pymapd to Load Data to OmniSci Cloud
  6. OmniSci Pymapd Demo • Jupyter Notebook https://github.com/omnisci/pymapd-workshop/blob/master/pymapd_usage.ipynb • Connect to

    OmniSci database • List tables in the database • Get table details • Run query and save results in a dataframe • Create table • Load data to table
  7. © OmniSci 2018 15 GPU Open Analytics Initiative (GOAI) Seamless

    data interchange framework in GPU memory
  8. Unifying GPU-accelerated Analytics and Data Science ✔ With OmniSci’s Arrow-capable

    python API (and via Ibis), OmniSci can output results direct to cudf, and integrate with RAPIDS via Python (requires pymapd 0.7.0 or higher). ✔ OmniSci’s JupyterLab integration (and support for Altair and Ibis) allows for connecting, querying, in-notebook visualization and extraction of data OmniSci User Defined Functions (coming 2019) will allow deeper, lower-level integration with RAPIDs libraries Altair: https://altair-viz.github.io/ Ibis: http://ibis-project.org/ OmniSci query result set in-GPU to RAPIDS GPU-resident outputs from RAPIDS ML algorithms
  9. OmniSci Pymapd ML Demo • Jupyter Notebook https://github.com/omnisci/pymapd-workshop/blob/master/flights_depdelay_cudf.ipynb • Connect

    to OmniSci database • Query departure delay & other features from flights table • Read data from query into CuDF dataframe • Prepping dataframe for model analysis • Using OLS (Ordinary Least Squares) to find feature impact on departure delay
  10. © OmniSci 2018 © OmniSci 2018 • omnisci.com/blog Read interesting

    stories on product usage • omnisci.com/demos Play with our live demos for yourself! • omnisci.cloud Get an OmniSci instance in 60 seconds • omnisci.com/platform/downloads/ Download a 30-day trial of OmniSci • community.omnisci.com Ask questions and share your experiences Next Steps