H2O & MapD Meetup at NVIDIA: GPU Accelerated AI

H2O & MapD: GPU Accelerated AI Meetup at NVIDIA Bill
Maimone, VP of Engineering at MapD Vinod Iyengar, Director of Business Development and Partnerships at H2O.ai August 29, 2017

INTRO TO MAPD

Confidential & Proprietary 3 Who is MapD? MapD was incubated
in the MIT CSAIL database group under the advisory of Michael Stonebraker and Sam Madden (Vertica). MapD has captured the imagination of some of the most sophisticated investors in Silicon Valley and beyond. “It's completely amazing to access databases so large completely in-memory and to interact with it, create graphs out of it, query it with AI, visualize it, all in real time. Completely revolutionary stuff.” Jensen Huang, CEO

Confidential & Proprietary 4 GPUs offer a way forward GPU
Processing Power 50% per year Data Growth 40% per year CPU Processing Power 20% per year

Confidential & Proprietary 5 GPU Processing CPU Processing (Traditional) 5,120
Cores Cores and memory define the difference 24 Cores

Confidential & Proprietary 6 MapD: software optimized for the fastest
hardware + 100x Faster Queries Speed of Thought Visualization MapD Core MapD Immerse A fast, relational, column store database powered by GPUs A visual analytics engine that leverages the speed + rendering capabilities of MapD Core

Confidential & Proprietary 7 Analytics ANALYTICS 3.0 - ACCELERATED -
PREDICTIVE - DYNAMIC ANALYTICS 2.0 ANALYTICS 1.0 1990s 2017 2000s The evolution of analytics GPUs will be as transformative to Analytics as Broadband was to the Internet

Confidential & Proprietary 8 Where does MapD fit in? Complementing
your entire data ecosystem

MapD Core 9 The world's fastest in-memory GPU database powers
the world's most immersive data exploration experience

Confidential & Proprietary 10 Keeping Data Close to Compute MapD
Core: Performance starts with memory management SSD or NVRAM STORAGE (L3) 250GB to 20TB 1-2 GB/sec CPU RAM (L2) 32GB to 3TB 70-120 GB/sec GPU RAM (L1) 24GB to 256GB 1000-6000 GB/sec Hot Data Speedup = 1500x to 5000x Over Cold Data Warm Data Speedup = 35x to 120x Over Cold Data Cold Data COMPUTE LAYER STORAGE LAYER Data Lake/Data Warehouse/System Of Record

Confidential & Proprietary 11 10111010101001010110101101010101 00110101101101010101010101011101 Query Compilation with LLVM
Traditional DBs can be highly inefficient • each operator in SQL treated as a separate function • incurs tremendous overhead and prevents vectorization MapD compiles queries w/LLVM to create one custom function • Queries run at speeds approaching hand-written functions • LLVM enables generic targeting of different architectures (GPUs, X86, ARM, etc). • Code can be generated to run query on CPU and GPU simultaneously

Confidential & Proprietary 12 Mark Litwintschik benchmarked MapD vs. major
CPU systems on a billion-row taxi data set and found it to be 6x to 12,500x faster than the fastest CPU databases. MapD GPU-Powered Queries are Fastest MapD Comparative Query Acceleration* System Query 1 Query 2 Query 3 Query 4 BrytlytDB & 2-node p2.16xlarge cluster 36x 47x 25x 12x ClickHouse, Intel Core i5 4670K 49x 58x 32x 25x Redshift, 6-node ds2.8xlarge cluster 74x 24x 14x 6x BigQuery 95x 38x 6x 6x Presto, 50-node n1-standard-4 cluster 190x 75x 61x 41x Amazon Athena 305x 117x 37x 13x Elasticsearch (heavily tuned) 386x 343x n/a n/a Spark 2.1, 11 x m3.xlarge cluster w/ HDFS 485x 153x 119x 169x Presto, 10-node n1-standard-4 cluster 524x 189x 127x 61x Vertica, Intel Core i5 4670K 685x 607x 203x 132x Elasticsearch (lightly tuned) 1,642x 1,194x n/a n/a Presto, 5-node m3.xlarge cluster w/ HDFS 1,667x 735x 388x 159x Presto, 50-node m3.xlarge cluster w/ S3 2,048x 849x 164x 86x PostgreSQL 9.5 & cstore_fdw 7,238x 3,302x 1,424x 722x Spark 1.6, 5-node m3.xlarge cluster w/ S3 12,571x 5,906x 3,758x 1,884x *All speed comparisons are to the “MapD & 1 Nvidia Pascal DGX-1” benchmark Source: http://tech.marksblogg.com/benchmarks.html

MapD Immerse Lightning fast visual analytics for the MapD Core
database

Confidential & Proprietary 14 Basic charts are frontend rendered using
D3 and other related toolkits Scatterplots, pointmaps + polygons are backend rendered using the Iris Rendering Engine on GPUs Geo-Viz is composited over a frontend rendered basemap MapD Immerse: our hybrid approach

15 DEMO

Confidential & Proprietary 16 GOAI: End-to-end analytics on the GPU
GPU Open Analytics Initiative – Fusing Machine Learning and GPU Analytics

Confidential & Proprietary 17 GOAI: End-to-end analytics on the GPU
GPU Open Analytics Initiative – Fusing Machine Learning and GPU Analytics MapD + pymapd + H2O (GPU) CPU ML Pipeline GPU ML Pipeline GPU ML Pipeline on GOAI Pointer in VRAM Python (CPU) Spark (CPU) CPU ML (CPU) CPU MapD (GPU) Python (GPU) ML (GPU)

MapD Proprietary. Get involved with GOAI ➢ GOAI website: gpuopenanalytics.com/
➢ GitHub: github.com/gpuopenanalytics ➢ Discussion Group: groups.google.com/forum/#!forum/gpuopenanalytics

19 DEMO

H2O.ai Overview 1

“Confidential and property of H2O.ai. All rights reserved” Company Overview
Founded 2011 Venture-backed, debuted in 2012 Products • H2O-3 • Sparkling Water • H2O4GPU • Driverless AI Mission Operationalize Data Science, and provide a platform for users to build beautiful data products Team 70 employees • Distributed Systems Engineers doing Machine Learning • Kaggle Winning Data Scientists • World-class visualization designers Headquarters Mountain View, CA

CONFIDENTIAL 3 A.C. Nielsen  A1 Telekom Austria  AAPT  Abovenet Communications 
Academic Administrative and Research Network  Academic Computer Centre Cyfronet H  Accelerated Data Works  Accenture  Accenture Services  Ace Ina Holdings  Ace International  Ace Telecom  Acton  Acxiom Oration  Adamo Telecom Iberia  Administracion Nacional De Telecomunicaciones  Admiral Objekt Waesche & Arbeitskleidung  Adobe Systems  Adobe Systems India  Adsl Maroc Telecom  Advanced Cable Communications  Advanced Computer Solutions  Affecto  Afrihost-Dynamic  Ainet Telekommunikations-Netzwerk Betriebs  Air Bank A.S.  Air Liquide Sa  Airess Cesko  Akamai Technologies  Aktia Saastopankki Oy  Aktiv-I Szolgaltato  Al-Shahad Information Technology  Albert Einstein College of Medicine of Yeshiva University  Albert-Ludwigs-Universitaet Freiburg  Alexander & Alexander Information Technology  Algar Telecom  Aliyun Computing  Allbusiness.Com  Allianz Maned Operations & Services Se  Bell Canada  Beltelecom  Beyond The Network America  Bezeq International-  Bh - Tec  Bharti Airtel  Bibliotheque Nationale De France  Big Fish Games  Bigleaf Networks  Biglobe  Bilink  Bimeh Dormitory Sharif University of Technology  Bio-Rad Laboratories  Biocontrol  Bisiness Network Jv  Bite Communications  Biznet  Biznet Metronet  Blekinge Institute of Technology  Blue Line Infotech  Blueconnect  Boingo Wireless  Bol.Com Bv  Boots UK Retail  Boranet  Borlange Energi  Boston Scientific Oration  Bouygues Telecom Division Mobile  Bouygues Telecom Sa  Brain Telecommunication  Bright House Networks  Brighthouse Networks Cfl Division  Brighthouse Networks Indianapolis  Bristish Petroleum  British Sky Broadcasting  Broadriver Communication  Broadstripe  Brutele Sc  Bryant University Case Western Reserve University  Catalina Marketing  Catalina Marketing Oration  Cect-Chinacomm Communications  Cedars-Sinai Health Systems  Celgene Oration  Center For Governmental Research  Centerbeam  Central Telegraph Public Joint-Stock  Centre De Calcul El-Khawarizmi - Cck  Centre For Advanced Computing  Centro De Tecnologia Da Informa O Renato Archer  Ceom Israel  Cerfnet  Cerner Oration  Certara USA  Ceu  Cgi Group  Champaign Telephone  Charles University  Charlesbrauer  Charter Communications  Chegg  Chengdu West Dimension Digital Technology  Cheonanjeonhwakukjang  Chico Board of Trade  China Digital Kingdom Technology  China Education and Research Network  Chinatelecom Group Beijing Co  Chongqing Times Newper Office  Chs - Bna Lan  Chunghwa Telecom Data Communication Business Group  Cik Telecom  Cisco  Cisco Systems  Cisco Systems Ironport Division  Citadel Investment Group L.L.C.  Citrix Systems  City University Delft University of Technology Network  Delhi Technical University(Dce)  Deloitte  Deloitte Services  Deloitte Touche Tohmatsu Services  Deloitte and Touch Regional Consulting Services  Delphon Industries  Delta Dental Plan of Michigan  Delta Leasedline Network  Deluxe Oration  Den Networks  Dena  Deutsche Telekom  Deutsches Reisebuero  Develon  Dhirubhai Ambani Institute of Information  Dialog Axiata  Digi Tavkozlesi Es Szolgaltato  Digia  Digital Entertainment  Digital Hosting Technology  Digital Network Associates - Franchisee  Digital Ocean  Digital Realm  Digital River  Digital-Entertainment-Industry-Development-Co--Zhongshan Zho  Digitalocean Cloud  Direct Supply  Discoveries In Sight  Dishnet Wireless  Distributel Communications  Disy Informationssysteme  Diverge Consulting  Dna Oy  Doclernet  Dongbeicaijingdaxue-Dl-Ln  Doorway As  Dotomi  Drivetime  Enbridge Pipelines  Ency For Science Technology and Research  End-User Numericable  Energy Sciences Network  Enom Orporated  Ensync Business Solutions Pty  Entanet International  Enterprise Teaming  Enzu  Eotvos Lorand University of Sciences  Epam Systems  Epm Telecomunicaciones E.S.P.  Epsilon Data Manement Dba  Equant  Equinox Consulting  Erasmus Mc  Erasmus University Rotterdam  Ericsson Business Communications  Ericsson Network Systems  Escout Consulting  Espn  Estate Valuations and Pricing Systems  Etapa Ep  Etex Communications  Etheric Networks  Ethio Telecom  Ethz Swiss Federal Institute of Technology Zurich  Etisalat Lanka (Private)  European Bioinformatics Institute  Evergy  Excell Media  Exe2 Newton Abbot  Exetel Act Dsl  Exponential-E FPL Fibernet  Facebook  Fachhochschule Dortmund  Fachhochschule Nordwestschweiz  Faculty of Sciences University of Lisbon Companies Using H2O.ai 2015 2016 N ow 2017 G oal 14,000 10,281 6,427 3,810 H2O.ai Users 2015 2016 N ow 2017 G oal 140,000 97,620 54,163 38,257 10,000+ Companies use H2O — World Wide Community Adoption

CONFIDENTIAL H2O.ai is a Visionary   in the Gartner Magic
Quadrant  for Data Science Platforms 4 “Overall customer satisfaction is very high.” “H2O is especially suited to IoT edge and device scenarios.” “H2O had the highest reference customer analytics support score of all the vendors.” “H2O.ai has significant adoption by large enterprises such as Macy’s, Comcast, and Capital One.” “H2O.ai is best known for developing open source, cluster-distributed ML algorithms at a time (2011) when big data demanded them, but no one else had them.” H2O.ai is a Strong Performer  in the Forrester Predictive Analytics & Machine Learning H2O ranked #1 for Advanced Prototyping in Gartner Critical Capabilities for Data Science Publish: January 2017 “H2O received an outstanding score in machine learning, largely driven by its popular best-in-class machine-learning implementations” H2O.ai Strongly Positioned in Key Analyst Reports “It also scored very highly for flexibility, extensibility and openness, as well as for delivery.”

CONFIDENTIAL AI in Financial Services 5 Wholesale / Commercial Banking
• Know Your Customers (KYC) • Anti-Money Laundering (AML) Card/Payments Business • Transaction Frauds • Real-time Targeting • Credit Risk Scoring • In-Context Promotion Retail Banking • Deposit Fraud • Customer Churn Prediction • Auto-Loan IT Infrastructure • Security Cyberlake • DoS Detection and Protection • Master Data Management

CONFIDENTIAL AI in Healthcare 6 Flu Season Prediction Personalized Drug
Matching Medical Claim Fraud Detection Emergency Room and Hospital Management Drug Discovery Remote Patient Monitoring Early Cancer Detection / Oncology Medical Imaging and Diagnostics Product Recommendation

Platforms and Product

“Confidential and property of H2O.ai. All rights reserved” Cloud Neutral
Big Data Ecosystem H 2 O.ai Makes A Difference as an AI Platform Open Source Flexible Interface Interpretability GPU Enablement Rapid Model Deployment Smart and Fast Algorithms H 2 O Flow • Highly portable models deployed in Java (POJO) and Model Object Optimized (MOJO) • Automated and streamlined scoring service deployment with Rest API • Reason code for predictions • Variable importance for models • Visual Intelligence

9 H2O GPU Solutions H2O4GPU • 100% open source •
Enterprise Support Algorithms • GLM • Gradient-boosted Tree • K-means • Random Forest Coming Soon • Linear SVM • PCA • Nearest Neighbors • Multi-GPU / multi-node Driverless AI • Enterprise License Solution functionality • Automatic Feature Engineering • Automated ML • Machine Learning Interpretability (MLI) • Automatic Visualization Coming Soon • Time Series / Text input • Model Stacking

“ConﬁdenZal and property of H2O.ai. All rights reserved” Model Model
Building Features Target Modeling Table Data Quality & Transformation Data Integration + Typical Enterprise Machine Learning Workﬂow

“ConﬁdenZal and property of H2O.ai. All rights reserved” Model Model
Building Features Target Modeling Table Data Quality & Transformation Data Integration + Driverless AI Typical Enterprise Machine Learning Workﬂow

“ConﬁdenZal and property of H2O.ai. All rights reserved” Data VISUAL
MODEL   INTERPRETATION AUTO DL Distributed Multi-GPU Model Repository Deploy MODEL FITNESS data.table Kaggle Grandmaster in a Box Automatic Feature Engineering Pipeline Export English Explanations Reason Codes K-Lime, LOCO, Partial Dependence Driverless AI Recipes H2O4GPU

“ConﬁdenZal and property of H2O.ai. All rights reserved” Speed •
GPU acceleration to achieve up to 40x speedups vs CPU • Multi GPU - XGBoost, GLM, K-Means and more • Achieve best performance in shortest time H2O4GPU

“ConﬁdenZal and property of H2O.ai. All rights reserved” 0 0.1
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Allstate BNP Paribas Amazon Homesite Otto Group Relative error: Lower is Better Kaggle Grandmaster Best AutoDL GBM Baseline • Automatic feature engineering to increase accuracy - AlphaGo for AI • Automatic Kaggle Grandmaster recipes in a box for solving wide variety of use-cases • Automatic machine learning to ﬁnd and tune the right ensemble of models Accuracy Preliminary results - untuned, single model Driverless AI

“ConﬁdenZal and property of H2O.ai. All rights reserved” Interpretability •
Interpretability for debugging, not just for regulators • Get reason codes and model interpretability in plain english • K-Lime, LOCO, partial dependence and more

“Confidential and property of H2O.ai. All rights reserved” Automatic Visualization
• Statistically sound and relevant visualizations generated automatically to highlight important characteristics in the data. • Automatic Scagnostics and other visualizations to generate the most relevant visualizations for each dataset • Can handle billions of rows using the aggregator function

CONFIDENTIAL GPU OPEN ANALYTICS INITIATIVE github.com/gpuopenanalytics GPU Data Frame (GDF)
Ingest/  Parse Exploratory Analysis Feature Engineering ML/DL Algorithms Grid Search Scoring Model  Export

FEATURE GTC Q2 2017 Q3 2017 Q4 2017 GLM (Single
GPU) Python API for training & scoring Python .whl installation file GBM (Single GPU) Inference on GPU (GLM) GLM (Multi GPU - Multiple Models in Parallel) GBM (Multi GPU - Multiple Models in Parallel) Inference on GPU (GBM) k-Means Clustering, Nearest Neighbors - Single GPU PCA (Single GPU) Quantiles (Single GPU) GLM (Multi GPU - Single Model) Sort (Single GPU) GBM (Multi GPU - Single Model) Random Forest (Multi GPU - Single Model) Kalman Filters Sort, Quantiles (Multi GPU) SVM (Single GPU) Connectors to GPU Open AI Data Frames H2O4GPU Roadmap

H2O & MapD Meetup at NVIDIA: GPU Accelerated AI

H2O & MapD Meetup at NVIDIA: GPU Accelerated AI

More Decks by OmniSci

Other Decks in Technology

Featured

Transcript