Upgrade to Pro — share decks privately, control downloads, hide ads and more …

End-to-End Open Source Data Science Workflow Us...

OmniSci
August 22, 2019

End-to-End Open Source Data Science Workflow Using OmniSci and Nvidia RAPIDS

Presented at 7th Annual Global Big Data Conference on Aug 22nd, 2019 at Santa Clara Convention Center

OmniSci

August 22, 2019
Tweet

More Decks by OmniSci

Other Decks in Technology

Transcript

  1. End-to-End Open Source Data Science Workflow Using OmniSci and Nvidia

    RAPIDS 7th Annual Global Big Data Conference Veda Shankar, OmniSci | Aug 22nd, 2019
  2. © OmniSci 2018 Data Grows Faster Than CPU Processing Data

    Growth 40% per year CPU Processing Power 20% per year
  3. © OmniSci 2018 8 RAPIDS: The New GPU Data Science

    Pipeline from: https://github.com/rapidsai
  4. © OmniSci 2018 Unifying GPU-accelerated Analytics and Data Science OmniSci

    query result set in-GPU to RAPIDS GPU-resident outputs from RAPIDS ML algorithms
  5. © OmniSci 2018 10 OmniSci Innovations Powering Extreme Analytics 3-Tier

    Memory Caching Query Compilation In-Situ Rendering
  6. © OmniSci 2018 Three Ways to Get Started GitHub repo

    OPEN SOURCE OmniSci as a service OMNISCI CLOUD Contact sales ENTERPRISE 13
  7. © OmniSci 2018 14 pymapd • The pymapd client interface

    provides a python DB API 2.0-compliant OmniSci interface. • pymapd provides methods to get results in the Apache Arrow-based GDF format for efficient data interchange with ML Libraries (XGBoost, H2O) • Reference blogs ◦ Using pymapd to Load Data to OmniSci Cloud
  8. OmniSci Pymapd Demo • Jupyter Notebook https://github.com/omnisci/pymapd-workshop/blob/master/pymapd_usage.ipynb • Connect to

    OmniSci database • List tables in the database • Get table details • Run query and save results in a dataframe • Create table • Load data to table
  9. OmniSci Pymapd ML Demo • Jupyter Notebook https://github.com/omnisci/pymapd-workshop/blob/master/flights_depdelay_cudf.ipynb • Connect

    to OmniSci database • Query departure delay & other features from flights table • Read data from query into CuDF dataframe • Prepping dataframe for model analysis • Using OLS (Ordinary Least Squares) to find feature impact on departure delay
  10. © OmniSci 2018 © OmniSci 2018 • omnisci.com/blog Read interesting

    stories on product usage • omnisci.com/demos Play with our live demos for yourself! • omnisci.cloud Get an OmniSci instance in 60 seconds • omnisci.com/platform/downloads/ Download a 30-day trial of OmniSci • community.omnisci.com Ask questions and share your experiences Next Steps
  11. Community Day (October 21st) is free to all attendees. Use

    Community50 for 50% off the entire conference! Date: October 21 - 23 Location: Computer History Museum 1401 N Shoreline Blvd Mountain View, CA 94043 converge.omnisci.com