Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Interactive Visualization of Large Geospatial D...

OmniSci
August 14, 2017

Interactive Visualization of Large Geospatial Datasets with GPUs

FOSS4G Boston 2017
Presented By Todd Mostak, Co-Founder & CEO
August 16, 2017

Todd Mostak, Co-founder & CEO of MapD, an open source GPU (Graphics Processing Unit) database and visualization platform for real-time analytics, explores the topic of GPUs & their role in geovisualization. In particular, you will hear about how complex visualizations with massive amounts of geospatial data are an ideal match for GPUs, unlocking extreme speeds for interactive data exploration and real-time insight generation. The ability to instantly interact with billions of rows of geospatial data can be used across industries such as ad tech, energy, financial services, government, retail and service providers allowing them to quickly find anomalies and drill-down into the individual level without pre-aggregating or downsampling.

Download our open source platform at http://www.mapd.com.
@mapd

OmniSci

August 14, 2017
Tweet

More Decks by OmniSci

Other Decks in Technology

Transcript

  1. Confidential & Proprietary 2 GPUs offer a way forward GPU

    Processing Power 50% per year Data Growth 40% per year CPU Processing Power 20% per year
  2. Confidential & Proprietary 3 MapD: software optimized for the fastest

    hardware + 100x Faster Queries Speed of Thought Visualization MapD Core MapD Immerse An in-memory, relational, column store database powered by GPUs A visual analytics engine that leverages the speed + rendering capabilities of MapD Core
  3. Confidential & Proprietary 4 Who is MapD? MapD was incubated

    in the MIT CSAIL database group under the advisory of Michael Stonebraker and Sam Madden (Vertica). MapD has captured the imagination of some of the most sophisticated investors in Silicon Valley and beyond. “It's completely amazing 
 to access databases so large completely in-memory and to interact with it, create 
 graphs out of it, query it 
 with AI, visualize it, all in 
 real time. Completely revolutionary stuff.” Jensen Huang, CEO
  4. Confidential & Proprietary 5 TIME 1991 2017 GPU’s will be

    as transformative to Analytics, as Broadband was to the Internet Analytics ANALYTICS 3.0 ACCELERATED/ENRICHED ANALYTICS 2.0 ANALYTICS 1.0 Why MapD?
  5. Confidential & Proprietary 6 Where does MapD fit in? Complementing

    your entire data ecosystem JDBC Kafka MapD Core Database Data Warehouse Data Lake, HDFS Streaming Data JDBC, ODBC, Thrift MapD Immerse Client GDF, Thrift Continuum, H20, TensorFlow Machine Learning Python, R Data Science GPU ACCELERATION Output Input 3rd Party Viz Custom Apps SQL Rendering Engine Tableau, Power BI
  6. MapD Core 7 The world's fastest in-memory GPU database powers

    the world's most immersive data exploration experience
  7. Confidential & Proprietary 8 Keeping Data Close to Compute
 MapD

    Core: Performance starts with memory management SSD or NVRAM STORAGE (L3) 250GB to 20TB 1-2 GB/sec CPU RAM (L2) 32GB to 3TB 70-120 GB/sec GPU RAM (L1) 24GB to 256GB 1000-6000 GB/sec Hot Data Speedup = 1500x to 5000x Over Cold Data Warm Data Speedup = 35x to 120x Over Cold Data Cold Data COMPUTE LAYER STORAGE LAYER Data Lake/Data Warehouse/System Of Record
  8. Confidential & Proprietary 9 101110101010010101101011010101 01 001101011011010101010101010111 01 Query Compilation

    with LLVM Traditional DBs can be highly inefficient • each operator in SQL treated as a separate function • incurs tremendous overhead and prevents vectorization 
 MapD compiles queries w/LLVM to create one custom function • Queries run at speeds approaching hand-written functions • LLVM enables generic targeting of different architectures (GPUs, X86, ARM, etc). • Code can be generated to run query on CPU and GPU simultaneously
  9. Confidential & Proprietary The table is sorted by the fastest

    time query 1 finished in (measured in seconds). The Fastest Database
 MapD’s innovation drives exceptional speed, scale and ROI 10 Noted DB blogger, Mark Litwintschik has benchmarked MapD vs. major CPU systems on a billion row taxi data set and found it to be between 74x to 3,500x faster than CPU DBs. * pre-computed date intervals
  10. Confidential & Proprietary 12 Basic charts are frontend rendered using

    D3 and other related toolkits Scatterplots, pointmaps + polygons are backend rendered using the Iris Rendering Engine on GPUs Geo-Viz is composited over a frontend rendered basemap MapD Immerse: our hybrid approach
  11. Server side rendering Data goes from compute (CUDA) to graphics

    (OpenGL) 
 pipeline without copy and comes back as compressed PNG (~100 KB) rather than raw data (> 1GB) Vega Spec (a visualization grammar) • A declarative JSON format for creating visualization designs • Used to describe backend visualizations • Defines attributes of render primitives which can be driven 
 by data columns and mapped 
 by scales Shader Compilation Framework • Templatized: supports multiple types (ints, floats, colors, etc), 
 and multiple continuities 
 (discrete, continuous) Backend Query-to- Render PNG Vega Frontend The X-Factor 13
  12. HIT testing 14 Render-to-data operation to get a row id

    • Use an auxiliary integer buffer to store row ids per-pixel • Use PBOs for GPU-to-CPU transfer for caching. • Apply a gaussian-weighted kernel to resolve hits near boundaries
 Run a SQL query using row id as filter 0 0 0 0 0 0 0 0 0 0 1 0 0 2 0 0 0 1 1 1 2 2 2 0 2 0 0 0 0 0
  13. Confidential & Proprietary 16 Announcing Collaboration with Harvard CGA
 Center

    for Geographic Analysis at Harvard – Accelerating Geospatial Research How will MapD bring the power of GPUs to geospatial analytics? Faster visualization of datasets of U.S. National Water Model and enriching them with data such as flood or drought vulnerabilities, local population densities, emergency response availability and social media sentiment about water policies Building a Vibrant Open Source Community: CGA will identify opportunities to add new geospatial features to the MapD platform, improve its general interoperability and extend support for Open Geospatial Consortium (OGC) standards First Project: Improving access to hydrological models used in water management and public safety
  14. Confidential & Proprietary 17 How is MapD being used?
 Verizon

    Wireless - Valuing speed and visualization How does interactive analysis with MapD Immerse allow Verizon to improve System Health? Ease & Speed of Interactivity allowed analysts to see patterns of previously unknown issues using visual data Macro view: Bird’s eye view of 
 patterns – can see both data over 
 1 month vs. 1 day Individual device events: Amongst billions of events, see patterns of events and drill down to single event Note: Example MapD Immerse dashboard pictured. This is NOT representative of an actual Verizon dashboard.
  15. Confidential & Proprietary 18 How is MapD being used?
 EOG

    – Exploring oil/gas well data at scale How does MapD enable real-time exploration of large geospatial petroleum datasets? Democratizing real-time data discovery: Analysts and geologists able to analyze data without the help of IT Speed-of-thought interactivity: Able to zoom into into different regions without waiting for queries to run Note: Example MapD Immerse dashboard pictured. This is NOT representative of an actual EOGdashboard. Speed of MapD Core + rendering capabilities of MapD Immerse enables interactive data discovery
  16. Confidential & Proprietary 21 Closing thoughts We are at an

    inflection point in compute and GPUs are set to dominate the coming decade.
  17. Confidential & Proprietary 22 GPUs allow users to scale up

    before needing to scale out. lowering performance-killing network overheads and decreasing hardware and administration costs. Closing thoughts
  18. Confidential & Proprietary 23 Closing thoughts Integrated Analytics on GPUs

    comprising querying, viz and ML provide critical efficiencies and capabilities not found in siloed systems.
  19. Confidential & Proprietary 25 GOAI: End-to-end analytics on the GPU

    GPU Open Analytics Initiative – Fusing Machine Learning and GPU Analytics
  20. Confidential & Proprietary 26 TELECOMMUNICATIONS Predictive Network Performance Customer Churn

    ENERGY Dynamic Oil Well Management How is MapD being used?
 Enabling the next generation of analytics applications FEDERAL Geo-analytics Cyber-security TELEMATICS Real-time fleet management Incentive-based insurance ADTECH Segmentation analytics FINANCIAL SERVICES Trading model generation Real-time Risk Fraud Anomaly Detection