Upgrade to Pro — share decks privately, control downloads, hide ads and more …

FOSS4G NA | Speed Meets Scale: Interactive Analysis of Large Datasets with GPUs

OmniSci
May 15, 2018
16

FOSS4G NA | Speed Meets Scale: Interactive Analysis of Large Datasets with GPUs

Thanks to the proliferation of geospatial data from sensors, smart phones, social media and transportation data, developers and users can explore and analyze that data to detect anomalies and discover game-changing insights. But, given the lagging query times and speed of traditional computing systems, accessing those insights in a timely manner can be costly.

Enter open source GPU-analytics. In this talk, Todd Mostak, CEO & Co-Founder of MapD, will deep dive into the capabilities of open source GPU-analytics and the role they are playing in advancing the visualization and analysis of geospatial data across a variety of industries. In particular, he will explain how GPU’s massive parallelization eliminates the need for data downsampling or pre-aggregation, and gives users the tools that beyond beyond mainstream–to interactively visualize and explore billions of rows of data in milliseconds–real-time.

OmniSci

May 15, 2018
Tweet

Transcript

  1. Data Is Growing Exponentially 2 In the next two years,

    the world will add more than 20 zettabytes (20 million petabytes)
  2. CPUs aren’t keeping pace with data growth 3 Data Growth

    40% per year CPU Processing Power 20% per year
  3. Analysts, data scientists & decision-makers suffer 4 REPLACE THIS IMAGE

    Coping strategies are not silver bullets: • Down-sampling • Pre-aggregation/heav y indexing • Massive scale-out
  4. GPUs Provide a Way Forward 5 GPU-computing maintains 1.5X performance

    gain per year 1000X by 2025 1.5X per year Moore’s Law slowing to 1.1X per year Single-threaded performance with CPU Original data up to the year 2010 collected and plotted by M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond, and C. Batten New plot and data collected for 2010-2015 by K. Rupp
  5. 6 MapD: Interactive analytics at scale, powered by GPUs Query

    billions of rows in milliseconds Get answers from your data with industry-standard SQL your team already knows, except hundreds of times faster. Self-service visual exploration Instantly find correlations and anomalies in your largest datasets. Import batch and streaming data orders of magnitudes faster than legacy applications. Rapidly Ingest massive datasets
  6. MapD Core Query billions of rows in milliseconds with the

    open-source SQL engine designed for the GPU
  7. 9 Keeping Data Close to Compute MapD Core: Performance starts

    with memory management SSD or NVRAM STORAGE (L3) 250GB to 20TB 1-2 GB/sec CPU RAM (L2) 32GB to 3TB 70-200 GB/sec GPU RAM (L1) 24GB to 256GB 1000-8000 GB/sec Hot Data Speedup = 1500x to 5000x Over Cold Data Warm Data Speedup = 35x to 120x Over Cold Data Cold Data COMPUTE LAYER STORAGE LAYER Data Lake/Data Warehouse/System Of Record
  8. Query compilation with LLVM MapD compiles queries with low-level virtual

    machine to create one custom function 10 Code at the speed of hardware Query results run at speeds approaching those of hand-written functions Architecture agnostic queries LLVM enables generic targeting of different architectures (GPUs, X86, ARM, etc) Hybrid execution on GPU and CPU Code can be generated to run queries across both 1011101 1011101 GPU CPU
  9. Native Geospatial SQL Operators (Coming Soon) 18 New native data

    types New spatial analytics functions High-performance clustering and joins Standards • ISO/IEC 13249-3 • Open Geospatial Consortium • 1999 SQL/MM
  10. 20 What our users are saying – Open Source MapD’s

    low lag solutions eliminate the need for custom data engineering….MapD provides an interactive experience with big data like no other solution today. – Open Source The old query times were between 30 and 60 minutes. With MapD we can access to multi-billion row datasets in an interactive manner, the beauty of all of it is… [its] all based on SQL…so there is no learning curve – Open Source – Open Source We can explore billion points interactively within a second. The results are impressive, as one can basically interrogate the trained AI model on the fly using visual analytics. “With our legacy system this one mega-query would take 18 hours to run. So, we had set up MapD and there’s the three of us sitting together, and we push enter to run the query for the first time in MapD. The result came back in under a second. Our jaws dropped, literally….It was just something we’d never experienced - something we thought we’d probably never experience.”
  11. Case study: Skyhook 21 “This is a watershed moment for

    geospatial analytics. MapD is probably the most important advance in the last 15 years for a geospatial analyst who needs to tear into very large data tables and get answers in real time.” – Rich Sutton, VP of Geospatial About: Mobile positioning provider, using sensor data for precise device location Challenge: Process 100k transactions per second, without down-sampling Results • Offers clients immersive geospatial exploration of property tracts • Saves hours with auto-generated SQL queries • Eliminates need for pre-aggregation Liz, please add map image for Skyhook: https://www.skyhookwireless.com/
  12. 22 Four Ways to Get Started COMMUNITY Website Download OPEN

    SOURCE Github CLOUD MapD Cloud ENTERPRISE Contact Sales
  13. 23 ORIGINATED AT MIT About MapD Top-Tier Venture Backing $37

    MILLION IN FUNDING USED BY 100+ GLOBAL ORGS OPEN-SOURCE COMMUNITY