Upgrade to Pro — share decks privately, control downloads, hide ads and more …

NVIDIA & MapD Meetup | Speed Meets Scale: Inte...

OmniSci
May 22, 2018
38

NVIDIA & MapD Meetup | Speed Meets Scale: Interactive Analysis of Large Datasets with GPUs

Thanks to the proliferation of geospatial data from sensors, smart phones, social media and transportation data, developers and users can explore and analyze that data to detect anomalies and discover game-changing insights. But, given the lagging query times and speed of traditional computing systems, accessing those insights in a timely manner can be costly.

Enter open source GPU-analytics. In this talk, Todd Mostak, CEO & Co-Founder of MapD, will deep dive into the capabilities of open source GPU-analytics and the role they are playing in advancing the visualization and analysis of geospatial data across a variety of industries. In particular, he will explain how GPU’s massive parallelization eliminates the need for data downsampling or pre-aggregation, and gives users the tools that beyond beyond mainstream–to interactively visualize and explore billions of rows of data in milliseconds–real-time.

OmniSci

May 22, 2018
Tweet

More Decks by OmniSci

Transcript

  1. © MapD 2018 Speed Meets Scale: Interactive Analysis of Large

    Datasets with GPUs Todd Mostak May 22, 2018
  2. Data Is Growing Exponentially 2 In the next two years,

    the world will add more than 20 zettabytes (20 million petabytes)
  3. CPUs aren’t keeping pace with data growth 3 Data Growth

    40% per year CPU Processing Power 20% per year
  4. Analysts, data scientists & decision-makers suffer 4 REPLACE THIS IMAGE

    Coping strategies are not silver bullets: • Down-sampling • Pre-aggregation/heav y indexing • Massive scale-out
  5. GPUs Provide a Way Forward 5 GPU-computing maintains 1.5X performance

    gain per year 1000X by 2025 1.5X per year Moore’s Law slowing to 1.1X per year Single-threaded performance with CPU Original data up to the year 2010 collected and plotted by M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond, and C. Batten New plot and data collected for 2010-2015 by K. Rupp
  6. 6 MapD: Interactive analytics at scale, powered by GPUs Query

    billions of rows in milliseconds Get answers from your data with industry-standard SQL your team already knows, except hundreds of times faster. Self-service visual exploration Instantly find correlations and anomalies in your largest datasets. Import batch and streaming data orders of magnitudes faster than legacy applications. Rapidly Ingest massive datasets
  7. MapD Core Query billions of rows in milliseconds with the

    open-source SQL engine designed for the GPU
  8. 9 Keeping Data Close to Compute MapD Core: Performance starts

    with memory management SSD or NVRAM STORAGE (L3) 250GB to 20TB 1-2 GB/sec CPU RAM (L2) 32GB to 3TB 70-200 GB/sec GPU RAM (L1) 24GB to 256GB 1000-8000 GB/sec Hot Data Speedup = 1500x to 5000x Over Cold Data Warm Data Speedup = 35x to 120x Over Cold Data Cold Data COMPUTE LAYER STORAGE LAYER Data Lake/Data Warehouse/System Of Record
  9. Query compilation with LLVM MapD compiles queries with low-level virtual

    machine to create one custom function 10 Code at the speed of hardware Query results run at speeds approaching those of hand-written functions Architecture agnostic queries LLVM enables generic targeting of different architectures (GPUs, X86, ARM, etc) Hybrid execution on GPU and CPU Code can be generated to run queries across both 1011101 1011101 GPU CPU
  10. Native Geospatial SQL Operators (Coming Soon) 18 New native data

    types New spatial analytics functions High-performance clustering and joins Standards • ISO/IEC 13249-3 • Open Geospatial Consortium • 1999 SQL/MM
  11. 20 What our users are saying – Open Source MapD’s

    low lag solutions eliminate the need for custom data engineering….MapD provides an interactive experience with big data like no other solution today. – Open Source The old query times were between 30 and 60 minutes. With MapD we can access to multi-billion row datasets in an interactive manner, the beauty of all of it is… [its] all based on SQL…so there is no learning curve – Open Source – Open Source We can explore billion points interactively within a second. The results are impressive, as one can basically interrogate the trained AI model on the fly using visual analytics. “With our legacy system this one mega-query would take 18 hours to run. So, we had set up MapD and there’s the three of us sitting together, and we push enter to run the query for the first time in MapD. The result came back in under a second. Our jaws dropped, literally….It was just something we’d never experienced - something we thought we’d probably never experience.”
  12. Case study: Verizon Telco operational analytics at scale 21 About:

    A global leader delivering communications and technology solutions Challenge: Commercial data systems team ingests and analyses over 10 billion rows of data every 5 days Results • Manages authentication and enablement of all Apple devices • Accelerates operational decisions • Reduced latency on one report from 20 minutes to 2 seconds “MapD has taken commodity GPUs and turned them into a solution that can transform the analytics industry.” – Mark Smith, Executive Director, Verizon Ventures
  13. Case study: Simulmedia TV advertising analytics at scale 22 About:

    Pioneer in the the audience-based, advanced TV advertising industry Challenge: Derive insights from 300 million daily events from 40 sources Results • Shows interactive picture of all TV advertising spend in milliseconds. • Eliminated manual reporting • Provided self-serve, interactive dashboard to clients “We adopted MapD because it was pushing the boundaries of what's possible with technology…” – Kyle Hubert, Chief Scientist
  14. Case study: Volkswagen 23 “When you make these models transparent,

    you understand it. You have confidence, and other people will have confidence in actually implementing these models.” – Dr. Zach Izham, VW Data:Lab https://www.youtube.com/watch?v=-mBg-lFz5fQ About: VW Group Data:Lab uses MapD to visualize black-box AI and ML models Challenge: Poor confidence and transparency hampers implementation and regulatory approval for models Results • Transparency of underlying data helps explain black-box models • AI model generates 500M rows, traversed interactively in MapD • Uses GPU Data Frame via GOAI Instilling confidence in AI models
  15. Case study: Skyhook 24 “This is a watershed moment for

    geospatial analytics. MapD is probably the most important advance in the last 15 years for a geospatial analyst who needs to tear into very large data tables and get answers in real time.” – Rich Sutton, VP of Geospatial About: Mobile positioning provider, using sensor data for precise device location Challenge: Process 100k transactions per second, without down-sampling Results • Offers clients immersive geospatial exploration of property tracts • Saves hours with auto-generated SQL queries • Eliminates need for pre-aggregation Liz, please add map image for Skyhook: https://www.skyhookwireless.com/
  16. 25 Four Ways to Get Started COMMUNITY Website Download OPEN

    SOURCE Github CLOUD MapD Cloud ENTERPRISE Contact Sales
  17. 26 ORIGINATED AT MIT About MapD Top-Tier Venture Backing $37

    MILLION IN FUNDING USED BY 100+ GLOBAL ORGS OPEN-SOURCE COMMUNITY