Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Accelerating the Machine Learning Pipeline on Very Large Datasets with the GPU Data Frame | QCon.AI

OmniSci
April 11, 2018

Accelerating the Machine Learning Pipeline on Very Large Datasets with the GPU Data Frame | QCon.AI

Veda Shankar (Senior Developer Advocate) and Wamsi Viswanath (Data Scientist) demonstrates on a very large dataset how to manage a full Machine Learning Pipeline with minimal data exchange overhead between MapD’s SQL engine and H2O’s generalized linear model library (GLM)..

OmniSci

April 11, 2018
Tweet

More Decks by OmniSci

Other Decks in Technology

Transcript

  1. © MapD 2018 Accelerating the Machine Learning Pipeline on Very

    Large Datasets with the GPU Data Frame Veda Shankar & Wamsi Viswanath | April 11, 2018
  2. © MapD 2018 Veda Shankar Sr Developer Advocate , MapD

    Community [email protected] slides: https://speakerdeck.com/mapd Wamsi Viswanath Data Scientist, MapD [email protected]
  3. © MapD 2018 Advanced memory management Three-tier caching to GPU

    RAM for speed and to SSDs for persistent storage 8 SSD or NVRAM STORAGE (L3) 250GB to 20TB 1-2 GB/sec CPU RAM (L2) 32GB to 3TB 70-120 GB/sec GPU RAM (L1) 24GB to 256GB 1000-6000 GB/sec Hot Data Speedup = 1500x to 5000x Over Cold Data Warm Data Speedup = 35x to 120x Over Cold Data Cold Data COMPUTE LAYER STORAGE LAYER Data Lake/Data Warehouse/System Of Record
  4. © MapD 2018 The GPU Open Analytics Initiative (GOAI) Seamless

    data interchange framework in GPU memory 9
  5. © MapD 2018 The GPU Open Analytics Initiative (GOAI) Creating

    common data frameworks to accelerate data science on GPUs 1 0 /mapd/pymapd /gpuopenanalytics/pygdf
  6. © MapD 2018 Machine Learning Pipeline 11 Personas in Analytics

    Lifecycle (Illustrative) Business Analyst Data Scientist Data Engineer IT Systems Admin Data Scientist / Business Analyst Data Preparation Data Discovery & Feature Engineering Model & Validate Predict Operationalize Monitoring & Refinement Evaluate & Decide GPUs
  7. © MapD 2018 • We’ve published a few notebooks showing

    how to connect to a MapD database and use an ML algorithm to make predictions 12 Github ML Examples /gpuopenanalytics/demo-docker /mapd/mapd-ml-demo
  8. © MapD 2018 © MapD 2018 • community.mapd.com Ask questions

    and share your experiences • mapd.com/cloud Try 14-day free trial, no credit card needed • mapd.com/demos Play with our demos • mapd.com/platform/download-community/ Get our free Community Edition and start playing 14 Next Steps