Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Speed Meets Scale: Analyzing & Visualizing Bill...

OmniSci
September 20, 2018

Speed Meets Scale: Analyzing & Visualizing Billions of Data Points with GPUs

Nearly a decade ago, disk-based data analytics platforms began to be superseded by in-memory systems, which offered orders-of-magnitude more bandwidth than their predecessors. This technological sea change was driven in large part by memory prices falling to the point where it became viable to hold large working sets of data entirely in RAM. Today we are witness to a similar paradigm shift as analytics workloads are increasingly shifted from CPUs to GPUs, which possess much higher compute and memory bandwidth than CPUs. 8 GPUs and 256GB of GPU VRAM can be fit into a single server, and while CPUs have seen relatively minimal memory bandwidth increases over the last several years, GPUs are rapidly moving to stacked DRAM (High-Bandwidth Memory), meaning that by next year a single GPU will possess over a terabyte per second of bandwidth.
Using the MapD data analytics platform as an example, Aaron Williams will explain why data scientists and analysts leveraging GPUs will have an immense advantage over CPU alternatives. He will show how MapD's open source GPU database and Immerse visualization platform leverage the massive parallelism and memory bandwidth of multiple GPUs to execute SQL queries and render complex visualizations with billions of rows in milliseconds, literally orders of magnitude faster than CPU systems. Aaron will also explain and demonstrate the 3 APIs available to developers who want to build their own custom applications that take advantage of the speed of the GPUs.

OmniSci

September 20, 2018
Tweet

More Decks by OmniSci

Other Decks in Technology

Transcript

  1. Speed Meets Scale: Analyzing & Visualizing Billions of Data Points

    with GPUs DB Tech Showcase | Tokyo | September 20, 2018
  2. © MapD 2018 Aaron Williams VP of Global Community @_arw_

    [email protected] /in/aaronwilliams/ /williamsaaron slides: https://speakerdeck.com/mapd
  3. © MapD 2018 3 Personas in Analytics Lifecycle (Illustrative) Business

    Analyst Data Scientist Data Engineer IT Systems Admin Data Scientist / Business Analyst Data Preparation Data Discovery & Feature Engineering Model & Validate Predict Operationalize Monitoring & Refinement Evaluate & Decide GPUs Friday, Sept 21
  4. © MapD 2018 9 GPU Processing CPU Processing 40,000 Cores

    20 Cores *fictitious example Latency Throughput CPU 1 ns per task (1 task/ns) x (20 cores) = 20 tasks/ns GPU 10 ns per task (0.1 task per ns) x (40,000 cores) = 4,000 task per ns Latency: Time to do a task. | Throughput: Number of tasks per unit time.
  5. © MapD 2018 Advanced memory management Three-tier caching to GPU

    RAM for speed and to SSDs for persistent storage 1 2 SSD or NVRAM STORAGE (L3) 250GB to 20TB 1-2 GB/sec CPU RAM (L2) 32GB to 3TB 70-120 GB/sec GPU RAM (L1) 24GB to 256GB 1000-6000 GB/sec Hot Data Speedup = 1500x to 5000x Over Cold Data Warm Data Speedup = 35x to 120x Over Cold Data Cold Data COMPUTE LAYER STORAGE LAYER Data Lake/Data Warehouse/System Of Record
  6. © MapD 2018 TOP-TIER VENTURE BACKING USED BY 100+ GLOBAL

    ORGS $37 MILLION IN FUNDING OPEN-SOURCE COMMUNITY About MapD 17
  7. • mapd.com/demos Play with our demos - everything demo you

    saw in this talk was live! • mapd.cloud Get a MapD instance in less than 60 seconds • www.mapd.com/platform/downloads/ Download the Community Edition • community.mapd.com Ask questions and share your experiences Next Steps
  8. © MapD 2018 Aaron Williams VP of Global Community @_arw_

    [email protected] /in/aaronwilliams/ /williamsaaron slides: https://speakerdeck.com/mapd Thank you! Questions?