Interactive Visualization of Large Geospatial Datasets with GPUs

Interactive Visualization of Large Geospatial Datasets with GPUs Todd Mostak,
Co-Founder & CEO August 16, 2017

Confidential & Proprietary 2 GPUs offer a way forward GPU
Processing Power 50% per year Data Growth 40% per year CPU Processing Power 20% per year

Confidential & Proprietary 3 MapD: software optimized for the fastest
hardware + 100x Faster Queries Speed of Thought Visualization MapD Core MapD Immerse An in-memory, relational, column store database powered by GPUs A visual analytics engine that leverages the speed + rendering capabilities of MapD Core

Confidential & Proprietary 4 Who is MapD? MapD was incubated
in the MIT CSAIL database group under the advisory of Michael Stonebraker and Sam Madden (Vertica). MapD has captured the imagination of some of the most sophisticated investors in Silicon Valley and beyond. “It's completely amazing   to access databases so large completely in-memory and to interact with it, create   graphs out of it, query it   with AI, visualize it, all in   real time. Completely revolutionary stuff.” Jensen Huang, CEO

Confidential & Proprietary 5 TIME 1991 2017 GPU’s will be
as transformative to Analytics, as Broadband was to the Internet Analytics ANALYTICS 3.0 ACCELERATED/ENRICHED ANALYTICS 2.0 ANALYTICS 1.0 Why MapD?

Confidential & Proprietary 6 Where does MapD fit in? Complementing
your entire data ecosystem JDBC Kafka MapD Core Database Data Warehouse Data Lake, HDFS Streaming Data JDBC, ODBC, Thrift MapD Immerse Client GDF, Thrift Continuum, H20, TensorFlow Machine Learning Python, R Data Science GPU ACCELERATION Output Input 3rd Party Viz Custom Apps SQL Rendering Engine Tableau, Power BI

MapD Core 7 The world's fastest in-memory GPU database powers
the world's most immersive data exploration experience

Confidential & Proprietary 8 Keeping Data Close to Compute  MapD
Core: Performance starts with memory management SSD or NVRAM STORAGE (L3) 250GB to 20TB 1-2 GB/sec CPU RAM (L2) 32GB to 3TB 70-120 GB/sec GPU RAM (L1) 24GB to 256GB 1000-6000 GB/sec Hot Data Speedup = 1500x to 5000x Over Cold Data Warm Data Speedup = 35x to 120x Over Cold Data Cold Data COMPUTE LAYER STORAGE LAYER Data Lake/Data Warehouse/System Of Record

Confidential & Proprietary 9 101110101010010101101011010101 01 001101011011010101010101010111 01 Query Compilation
with LLVM Traditional DBs can be highly inefficient • each operator in SQL treated as a separate function • incurs tremendous overhead and prevents vectorization   MapD compiles queries w/LLVM to create one custom function • Queries run at speeds approaching hand-written functions • LLVM enables generic targeting of different architectures (GPUs, X86, ARM, etc). • Code can be generated to run query on CPU and GPU simultaneously

Confidential & Proprietary The table is sorted by the fastest
time query 1 finished in (measured in seconds). The Fastest Database  MapD’s innovation drives exceptional speed, scale and ROI 10 Noted DB blogger, Mark Litwintschik has benchmarked MapD vs. major CPU systems on a billion row taxi data set and found it to be between 74x to 3,500x faster than CPU DBs. * pre-computed date intervals

MapD Immerse Lightning fast visual analytics for the MapD Core
database

Confidential & Proprietary 12 Basic charts are frontend rendered using
D3 and other related toolkits Scatterplots, pointmaps + polygons are backend rendered using the Iris Rendering Engine on GPUs Geo-Viz is composited over a frontend rendered basemap MapD Immerse: our hybrid approach

Server side rendering Data goes from compute (CUDA) to graphics
(OpenGL)   pipeline without copy and comes back as compressed PNG (~100 KB) rather than raw data (> 1GB) Vega Spec (a visualization grammar) • A declarative JSON format for creating visualization designs • Used to describe backend visualizations • Defines attributes of render primitives which can be driven   by data columns and mapped   by scales Shader Compilation Framework • Templatized: supports multiple types (ints, floats, colors, etc),   and multiple continuities   (discrete, continuous) Backend Query-to- Render PNG Vega Frontend The X-Factor 13

HIT testing 14 Render-to-data operation to get a row id
• Use an auxiliary integer buffer to store row ids per-pixel • Use PBOs for GPU-to-CPU transfer for caching. • Apply a gaussian-weighted kernel to resolve hits near boundaries  Run a SQL query using row id as filter 0 0 0 0 0 0 0 0 0 0 1 0 0 2 0 0 0 1 1 1 2 2 2 0 2 0 0 0 0 0

HOW IS MAPD BEING USED TO ACCELERATE GEOSPATIAL ANALYTICS? 

Confidential & Proprietary 16 Announcing Collaboration with Harvard CGA  Center
for Geographic Analysis at Harvard – Accelerating Geospatial Research How will MapD bring the power of GPUs to geospatial analytics? Faster visualization of datasets of U.S. National Water Model and enriching them with data such as flood or drought vulnerabilities, local population densities, emergency response availability and social media sentiment about water policies Building a Vibrant Open Source Community: CGA will identify opportunities to add new geospatial features to the MapD platform, improve its general interoperability and extend support for Open Geospatial Consortium (OGC) standards First Project: Improving access to hydrological models used in water management and public safety

Confidential & Proprietary 17 How is MapD being used?  Verizon
Wireless - Valuing speed and visualization How does interactive analysis with MapD Immerse allow Verizon to improve System Health? Ease & Speed of Interactivity allowed analysts to see patterns of previously unknown issues using visual data Macro view: Bird’s eye view of   patterns – can see both data over   1 month vs. 1 day Individual device events: Amongst billions of events, see patterns of events and drill down to single event Note: Example MapD Immerse dashboard pictured. This is NOT representative of an actual Verizon dashboard.

Confidential & Proprietary 18 How is MapD being used?  EOG
– Exploring oil/gas well data at scale How does MapD enable real-time exploration of large geospatial petroleum datasets? Democratizing real-time data discovery: Analysts and geologists able to analyze data without the help of IT Speed-of-thought interactivity: Able to zoom into into different regions without waiting for queries to run Note: Example MapD Immerse dashboard pictured. This is NOT representative of an actual EOGdashboard. Speed of MapD Core + rendering capabilities of MapD Immerse enables interactive data discovery

Confidential & Proprietary 19 MapD, Now Open Source

20 DEMO

Confidential & Proprietary 21 Closing thoughts We are at an
inflection point in compute and GPUs are set to dominate the coming decade.

Confidential & Proprietary 22 GPUs allow users to scale up
before needing to scale out. lowering performance-killing network overheads and decreasing hardware and administration costs. Closing thoughts

Confidential & Proprietary 23 Closing thoughts Integrated Analytics on GPUs
comprising querying, viz and ML provide critical efficiencies and capabilities not found in siloed systems.

Confidential & Proprietary 25 GOAI: End-to-end analytics on the GPU
GPU Open Analytics Initiative – Fusing Machine Learning and GPU Analytics

Confidential & Proprietary 26 TELECOMMUNICATIONS Predictive Network Performance Customer Churn
ENERGY Dynamic Oil Well Management How is MapD being used?  Enabling the next generation of analytics applications FEDERAL Geo-analytics Cyber-security TELEMATICS Real-time fleet management Incentive-based insurance ADTECH Segmentation analytics FINANCIAL SERVICES Trading model generation Real-time Risk Fraud Anomaly Detection

Interactive Visualization of Large Geospatial D...

Interactive Visualization of Large Geospatial Datasets with GPUs

OmniSci

More Decks by OmniSci

Other Decks in Technology

Featured

Transcript