Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Lunch & Learn: Visualizing Billions of Data Points with GPUs

Lunch & Learn: Visualizing Billions of Data Points with GPUs

Nearly a decade ago, disk-based data analytics platforms began to be superseded by in-memory systems, which offered orders-of-magnitude more bandwidth than their predecessors. This technological sea change was driven in large part by memory prices falling to the point where it became viable to hold large working sets of data entirely in RAM. Today we are witness to a similar paradigm shift as analytics workloads are increasingly shifted from CPUs to GPUs, which possess much higher compute and memory bandwidth than CPUs. 8 GPUs and 256GB of GPU VRAM can be fit into a single server, and while CPUs have seen relatively minimal memory bandwidth increases over the last several years, GPUs are rapidly moving to stacked DRAM (High-Bandwidth Memory), meaning that by next year a single GPU will possess over a terabyte per second of bandwidth.

Using the MapD data analytics platform as an example, Aaron Williams will explain why data scientists and analysts leveraging GPUs will have an immense advantage over CPU alternatives. He will show how MapD's open source GPU database and Immerse visualization platform leverage the massive parallelism and memory bandwidth of multiple GPUs to execute SQL queries and render complex visualizations with billions of rows in milliseconds, literally orders of magnitude faster than CPU systems. Aaron will also explain and demonstrate the 3 APIs available to developers who want to build their own custom applications that take advantage of the speed of the GPUs.

MapD has collaborated with researchers at the Harvard Center for Geographic Analysis to provide true interactive access to NWM predictions for stream flow and ground saturation across the entire continental US. Aaron will briefly cover MapD’s GPU-based analytics platform and how it has advanced the visualization and analysis of geospatial data, using the NWM as a use case.

OmniSci

June 17, 2018
Tweet

More Decks by OmniSci

Other Decks in Technology

Transcript

  1. © MapD 2018 Lunch & Learn: Visualizing Billions of Data

    Points with GPUs DFW Data Visualization & Infographics | June 20, 2018
  2. © MapD 2018 Aaron Williams VP of Global Community @_arw_

    [email protected] /in/aaronwilliams/ /williamsaaron slides: https://speakerdeck.com/mapd
  3. © MapD 2018 © MapD 2018 3 “Every business will

    become a software business, build applications, use advanced analytics and provide SaaS services.” - Smart CEO Guy has
  4. Core Density Makes a Huge Difference 4 GPU Processing CPU

    Processing 40,000 Cores 20 Cores *fictitious example Latency Throughput CPU 1 ns per task (1 task/ns) x (20 cores) = 20 tasks/ns GPU 10 ns per task (0.1 task per ns) x (40,000 cores) = 4,000 task per ns Latency: Time to do a task. | Throughput: Number of tasks per unit time.
  5. © MapD 2018 Advanced memory management Three-tier caching to GPU

    RAM for speed and to SSDs for persistent storage 7 SSD or NVRAM STORAGE (L3) 250GB to 20TB 1-2 GB/sec CPU RAM (L2) 32GB to 3TB 70-120 GB/sec GPU RAM (L1) 24GB to 256GB 1000-6000 GB/sec Hot Data Speedup = 1500x to 5000x Over Cold Data Warm Data Speedup = 35x to 120x Over Cold Data Cold Data COMPUTE LAYER STORAGE LAYER Data Lake/Data Warehouse/System Of Record
  6. © MapD 2018 And Now … Native Geospatial! 10 First

    Data Types • POINT • LINE • POLYGON First Functions • DISTANCE • CONTAINS Get Involved • Roadmap Being Discussed MapD (OSS) Working Group [email protected] • Beta Available Now Email Aaron - [email protected]
  7. © MapD 2018 Categories of common MapD use cases 11

    Operational Analytics • Thwart Banking Fraud • Scan for Cyber Threats • Fine-tune Advertising • Maintain the Utility Grid Geospatial Analytics • Monitor Networks • Ready Logistics • Forecast Micro-weather Data Science • Model Financial Markets • Predict Maintenance • Predict Staffing Levels
  8. © MapD 2018 The GPU Open Analytics Initiative (GOAI) Creating

    common data frameworks to accelerate data science on GPUs 1 2 /mapd/pymapd /gpuopenanalytics/pygdf
  9. © MapD 2018 © MapD 2018 • mapd.com/demos Play with

    our demos • mapd.cloud Get a MapD instance in less than 60 seconds • mapd.com/platform/download-community/ Download the Community Edition • community.mapd.com Ask questions and share your experiences 13 Next Steps
  10. © MapD 2018 Aaron Williams VP of Global Community @_arw_

    [email protected] /in/aaronwilliams/ /williamsaaron slides: https://speakerdeck.com/mapd Thank you! Questions?