Upgrade to Pro — share decks privately, control downloads, hide ads and more …

18-06-19_Austin_Big_Data_with_NVIDIA.pdf

 18-06-19_Austin_Big_Data_with_NVIDIA.pdf

In this tech talk, Aaron Williams, VP of Global Community of MapD, will focus on how the key technical differentiators of GPUs: their massive computational bandwidth, fast memory, and native rendering pipeline, make them uniquely suited to allow analysts and data scientists to query, visualize and power machine learning over large, often high-velocity, datasets. Using the open source MapD analytics platform as an example, Aaron will detail the technical approaches MapD took to leverage the full parallelism of GPUs and demo how the platform allows analysts to interactively explore datasets containing tens of billions of records.

OmniSci

June 17, 2018
Tweet

More Decks by OmniSci

Other Decks in Technology

Transcript

  1. © MapD 2018 Speed at Scale: Using GPUs to Accelerate

    Analytics for Extreme Use Cases Austin Big Data Meetup June 19, 2018
  2. © MapD 2018 Aaron Williams VP of Global Community @_arw_

    [email protected] /in/aaronwilliams/ /williamsaaron slides: https://speakerdeck.com/mapd
  3. © MapD 2018 The Evolution of Data as a Weapon

    3 Collect It Make It Actionable Make it Predictive
  4. Core Density Makes a Huge Difference 4 GPU Processing CPU

    Processing 40,000 Cores 20 Cores *fictitious example Latency Throughput CPU 1 ns per task (1 task/ns) x (20 cores) = 20 tasks/ns GPU 10 ns per task (0.1 task per ns) x (40,000 cores) = 4,000 task per ns Latency: Time to do a task. | Throughput: Number of tasks per unit time.
  5. © MapD 2018 Advanced memory management Three-tier caching to GPU

    RAM for speed and to SSDs for persistent storage 7 SSD or NVRAM STORAGE (L3) 250GB to 20TB 1-2 GB/sec CPU RAM (L2) 32GB to 3TB 70-120 GB/sec GPU RAM (L1) 24GB to 256GB 1000-6000 GB/sec Hot Data Speedup = 1500x to 5000x Over Cold Data Warm Data Speedup = 35x to 120x Over Cold Data Cold Data COMPUTE LAYER STORAGE LAYER Data Lake/Data Warehouse/System Of Record
  6. © MapD 2018 The GPU Open Analytics Initiative Creating common

    data frameworks to accelerate data science on GPUs 8 /mapd/pymapd /gpuopenanalytics/pygdf
  7. © MapD 2018 Machine Learning Pipeline 9 Personas in Analytics

    Lifecycle (Illustrative) Business Analyst Data Scientist Data Engineer IT Systems Admin Data Scientist / Business Analyst Data Preparation Data Discovery & Feature Engineering Model & Validate Predict Operationalize Monitoring & Refinement Evaluate & Decide GPUs
  8. © MapD 2018 • We’ve published a few notebooks showing

    how to connect to a MapD database and use an ML algorithm to make predictions • We’ve also shared a real-world example of churn, which we implemented with VW 10 ML Examples /gpuopenanalytics/demo-docker /mapd/mapd-ml-demo
  9. © MapD 2018 And Now … Native Geospatial! 11 First

    Data Types • POINT • LINE • POLYGON First Functions • DISTANCE • CONTAINS Get Involved • Roadmap Being Discussed MapD (OSS) Working Group [email protected] • Beta Available Now Email Aaron - [email protected]
  10. © MapD 2018 © MapD 2018 • mapd.com/demos Play with

    our demos • mapd.cloud Get a MapD instance in less than 60 seconds • mapd.com/platform/download-community/ Download the Community Edition • community.mapd.com Ask questions and share your experiences 12 Next Steps
  11. © MapD 2018 Aaron Williams VP of Global Community @_arw_

    [email protected] /in/aaronwilliams/ /williamsaaron slides: https://speakerdeck.com/mapd Thank you! Questions?