Slide 1

Slide 1 text

Interactive Visualization of Large Geospatial Datasets with GPUs Todd Mostak, Co-Founder & CEO August 16, 2017

Slide 2

Slide 2 text

Confidential & Proprietary 2 GPUs offer a way forward GPU Processing Power 50% per year Data Growth 40% per year CPU Processing Power 20% per year

Slide 3

Slide 3 text

Confidential & Proprietary 3 MapD: software optimized for the fastest hardware + 100x Faster Queries Speed of Thought Visualization MapD Core MapD Immerse An in-memory, relational, column store database powered by GPUs A visual analytics engine that leverages the speed + rendering capabilities of MapD Core

Slide 4

Slide 4 text

Confidential & Proprietary 4 Who is MapD? MapD was incubated in the MIT CSAIL database group under the advisory of Michael Stonebraker and Sam Madden (Vertica). MapD has captured the imagination of some of the most sophisticated investors in Silicon Valley and beyond. “It's completely amazing 
 to access databases so large completely in-memory and to interact with it, create 
 graphs out of it, query it 
 with AI, visualize it, all in 
 real time. Completely revolutionary stuff.” Jensen Huang, CEO

Slide 5

Slide 5 text

Confidential & Proprietary 5 TIME 1991 2017 GPU’s will be as transformative to Analytics, as Broadband was to the Internet Analytics ANALYTICS 3.0 ACCELERATED/ENRICHED ANALYTICS 2.0 ANALYTICS 1.0 Why MapD?

Slide 6

Slide 6 text

Confidential & Proprietary 6 Where does MapD fit in? Complementing your entire data ecosystem JDBC Kafka MapD Core Database Data Warehouse Data Lake, HDFS Streaming Data JDBC, ODBC, Thrift MapD Immerse Client GDF, Thrift Continuum, H20, TensorFlow Machine Learning Python, R Data Science GPU ACCELERATION Output Input 3rd Party Viz Custom Apps SQL Rendering Engine Tableau, Power BI

Slide 7

Slide 7 text

MapD Core 7 The world's fastest in-memory GPU database powers the world's most immersive data exploration experience

Slide 8

Slide 8 text

Confidential & Proprietary 8 Keeping Data Close to Compute
 MapD Core: Performance starts with memory management SSD or NVRAM STORAGE (L3) 250GB to 20TB 1-2 GB/sec CPU RAM (L2) 32GB to 3TB 70-120 GB/sec GPU RAM (L1) 24GB to 256GB 1000-6000 GB/sec Hot Data Speedup = 1500x to 5000x Over Cold Data Warm Data Speedup = 35x to 120x Over Cold Data Cold Data COMPUTE LAYER STORAGE LAYER Data Lake/Data Warehouse/System Of Record

Slide 9

Slide 9 text

Confidential & Proprietary 9 101110101010010101101011010101 01 001101011011010101010101010111 01 Query Compilation with LLVM Traditional DBs can be highly inefficient • each operator in SQL treated as a separate function • incurs tremendous overhead and prevents vectorization 
 MapD compiles queries w/LLVM to create one custom function • Queries run at speeds approaching hand-written functions • LLVM enables generic targeting of different architectures (GPUs, X86, ARM, etc). • Code can be generated to run query on CPU and GPU simultaneously

Slide 10

Slide 10 text

Confidential & Proprietary The table is sorted by the fastest time query 1 finished in (measured in seconds). The Fastest Database
 MapD’s innovation drives exceptional speed, scale and ROI 10 Noted DB blogger, Mark Litwintschik has benchmarked MapD vs. major CPU systems on a billion row taxi data set and found it to be between 74x to 3,500x faster than CPU DBs. * pre-computed date intervals

Slide 11

Slide 11 text

MapD Immerse Lightning fast visual analytics for the MapD Core database

Slide 12

Slide 12 text

Confidential & Proprietary 12 Basic charts are frontend rendered using D3 and other related toolkits Scatterplots, pointmaps + polygons are backend rendered using the Iris Rendering Engine on GPUs Geo-Viz is composited over a frontend rendered basemap MapD Immerse: our hybrid approach

Slide 13

Slide 13 text

Server side rendering Data goes from compute (CUDA) to graphics (OpenGL) 
 pipeline without copy and comes back as compressed PNG (~100 KB) rather than raw data (> 1GB) Vega Spec (a visualization grammar) • A declarative JSON format for creating visualization designs • Used to describe backend visualizations • Defines attributes of render primitives which can be driven 
 by data columns and mapped 
 by scales Shader Compilation Framework • Templatized: supports multiple types (ints, floats, colors, etc), 
 and multiple continuities 
 (discrete, continuous) Backend Query-to- Render PNG Vega Frontend The X-Factor 13

Slide 14

Slide 14 text

HIT testing 14 Render-to-data operation to get a row id • Use an auxiliary integer buffer to store row ids per-pixel • Use PBOs for GPU-to-CPU transfer for caching. • Apply a gaussian-weighted kernel to resolve hits near boundaries
 Run a SQL query using row id as filter 0 0 0 0 0 0 0 0 0 0 1 0 0 2 0 0 0 1 1 1 2 2 2 0 2 0 0 0 0 0

Slide 15

Slide 15 text

HOW IS MAPD BEING USED TO ACCELERATE GEOSPATIAL ANALYTICS?


Slide 16

Slide 16 text

Confidential & Proprietary 16 Announcing Collaboration with Harvard CGA
 Center for Geographic Analysis at Harvard – Accelerating Geospatial Research How will MapD bring the power of GPUs to geospatial analytics? Faster visualization of datasets of U.S. National Water Model and enriching them with data such as flood or drought vulnerabilities, local population densities, emergency response availability and social media sentiment about water policies Building a Vibrant Open Source Community: CGA will identify opportunities to add new geospatial features to the MapD platform, improve its general interoperability and extend support for Open Geospatial Consortium (OGC) standards First Project: Improving access to hydrological models used in water management and public safety

Slide 17

Slide 17 text

Confidential & Proprietary 17 How is MapD being used?
 Verizon Wireless - Valuing speed and visualization How does interactive analysis with MapD Immerse allow Verizon to improve System Health? Ease & Speed of Interactivity allowed analysts to see patterns of previously unknown issues using visual data Macro view: Bird’s eye view of 
 patterns – can see both data over 
 1 month vs. 1 day Individual device events: Amongst billions of events, see patterns of events and drill down to single event Note: Example MapD Immerse dashboard pictured. This is NOT representative of an actual Verizon dashboard.

Slide 18

Slide 18 text

Confidential & Proprietary 18 How is MapD being used?
 EOG – Exploring oil/gas well data at scale How does MapD enable real-time exploration of large geospatial petroleum datasets? Democratizing real-time data discovery: Analysts and geologists able to analyze data without the help of IT Speed-of-thought interactivity: Able to zoom into into different regions without waiting for queries to run Note: Example MapD Immerse dashboard pictured. This is NOT representative of an actual EOGdashboard. Speed of MapD Core + rendering capabilities of MapD Immerse enables interactive data discovery

Slide 19

Slide 19 text

Confidential & Proprietary 19 MapD, Now Open Source

Slide 20

Slide 20 text

20 DEMO

Slide 21

Slide 21 text

Confidential & Proprietary 21 Closing thoughts We are at an inflection point in compute and GPUs are set to dominate the coming decade.

Slide 22

Slide 22 text

Confidential & Proprietary 22 GPUs allow users to scale up before needing to scale out. lowering performance-killing network overheads and decreasing hardware and administration costs. Closing thoughts

Slide 23

Slide 23 text

Confidential & Proprietary 23 Closing thoughts Integrated Analytics on GPUs comprising querying, viz and ML provide critical efficiencies and capabilities not found in siloed systems.

Slide 24

Slide 24 text

No content

Slide 25

Slide 25 text

Confidential & Proprietary 25 GOAI: End-to-end analytics on the GPU GPU Open Analytics Initiative – Fusing Machine Learning and GPU Analytics

Slide 26

Slide 26 text

Confidential & Proprietary 26 TELECOMMUNICATIONS Predictive Network Performance Customer Churn ENERGY Dynamic Oil Well Management How is MapD being used?
 Enabling the next generation of analytics applications FEDERAL Geo-analytics Cyber-security TELEMATICS Real-time fleet management Incentive-based insurance ADTECH Segmentation analytics FINANCIAL SERVICES Trading model generation Real-time Risk Fraud Anomaly Detection