Upgrade to Pro — share decks privately, control downloads, hide ads and more …

H2O & MapD Meetup at NVIDIA: GPU Accelerated AI

OmniSci
September 11, 2017

H2O & MapD Meetup at NVIDIA: GPU Accelerated AI

The GPU Open Analytics Initiative (GOAI) was formed to create common data frameworks enabling developers and statistical researchers to accelerate data science on GPUs.

GOAI will foster the development of a data science ecosystem on GPUs by allowing resident applications to interchange data seamlessly and efficiently.

Special thanks to the NVIDIA for hosting this meetup.

Speaker Bios:

Vinod Iyengar is the Director of Business Development and Partnerships at H2O.ai. Vinod comes with over 7 years of marketing and data science experience in multiple startups. He was the founding employee for his previous startup, Activehours, where he helped build the product and bootstrap the user acquisition with growth hacking. He has seen the user base for his companies grow from scratch to millions of customers. He’s built models to score leads, reduce churn, increase conversion, prevent fraud and many more use cases.

When he is not busy hacking, Vinod loves painting and reading. He is a huge foodie and will eat anything that doesn’t crawl, swim or move.

Bill Maimone is the VP of Engineering at MapD. Bill is responsible for leading the engineering team in architecting and delivering the the company’s suite of database, visualization and GIS solutions. Prior to MapD, Bill was the VP of Engineering at Anaplan and held similar roles with Salesforce and Actian. Bill began his career at Oracle, where he spent two decades, finishing as a Vice President with responsibility for over five hundred members of the R&D team across four continents. He holds a M.S. and a B.S. in Computer Science and a Bachelor of Science in Journalism from the Massachusetts Institute of Technology.

OmniSci

September 11, 2017
Tweet

More Decks by OmniSci

Other Decks in Technology

Transcript

  1. H2O & MapD: GPU Accelerated AI Meetup at NVIDIA Bill

    Maimone, VP of Engineering at MapD Vinod Iyengar, Director of Business Development and Partnerships at H2O.ai August 29, 2017
  2. Confidential & Proprietary 3 Who is MapD? MapD was incubated

    in the MIT CSAIL database group under the advisory of Michael Stonebraker and Sam Madden (Vertica). MapD has captured the imagination of some of the most sophisticated investors in Silicon Valley and beyond. “It's completely amazing to access databases so large completely in-memory and to interact with it, create graphs out of it, query it with AI, visualize it, all in real time. Completely revolutionary stuff.” Jensen Huang, CEO
  3. Confidential & Proprietary 4 GPUs offer a way forward GPU

    Processing Power 50% per year Data Growth 40% per year CPU Processing Power 20% per year
  4. Confidential & Proprietary 5 GPU Processing CPU Processing (Traditional) 5,120

    Cores Cores and memory define the difference 24 Cores
  5. Confidential & Proprietary 6 MapD: software optimized for the fastest

    hardware + 100x Faster Queries Speed of Thought Visualization MapD Core MapD Immerse A fast, relational, column store database powered by GPUs A visual analytics engine that leverages the speed + rendering capabilities of MapD Core
  6. Confidential & Proprietary 7 Analytics ANALYTICS 3.0 - ACCELERATED -

    PREDICTIVE - DYNAMIC ANALYTICS 2.0 ANALYTICS 1.0 1990s 2017 2000s The evolution of analytics GPUs will be as transformative to Analytics as Broadband was to the Internet
  7. MapD Core 9 The world's fastest in-memory GPU database powers

    the world's most immersive data exploration experience
  8. Confidential & Proprietary 10 Keeping Data Close to Compute MapD

    Core: Performance starts with memory management SSD or NVRAM STORAGE (L3) 250GB to 20TB 1-2 GB/sec CPU RAM (L2) 32GB to 3TB 70-120 GB/sec GPU RAM (L1) 24GB to 256GB 1000-6000 GB/sec Hot Data Speedup = 1500x to 5000x Over Cold Data Warm Data Speedup = 35x to 120x Over Cold Data Cold Data COMPUTE LAYER STORAGE LAYER Data Lake/Data Warehouse/System Of Record
  9. Confidential & Proprietary 11 10111010101001010110101101010101 00110101101101010101010101011101 Query Compilation with LLVM

    Traditional DBs can be highly inefficient • each operator in SQL treated as a separate function • incurs tremendous overhead and prevents vectorization MapD compiles queries w/LLVM to create one custom function • Queries run at speeds approaching hand-written functions • LLVM enables generic targeting of different architectures (GPUs, X86, ARM, etc). • Code can be generated to run query on CPU and GPU simultaneously
  10. Confidential & Proprietary 12 Mark Litwintschik benchmarked MapD vs. major

    CPU systems on a billion-row taxi data set and found it to be 6x to 12,500x faster than the fastest CPU databases. MapD GPU-Powered Queries are Fastest MapD Comparative Query Acceleration* System Query 1 Query 2 Query 3 Query 4 BrytlytDB & 2-node p2.16xlarge cluster 36x 47x 25x 12x ClickHouse, Intel Core i5 4670K 49x 58x 32x 25x Redshift, 6-node ds2.8xlarge cluster 74x 24x 14x 6x BigQuery 95x 38x 6x 6x Presto, 50-node n1-standard-4 cluster 190x 75x 61x 41x Amazon Athena 305x 117x 37x 13x Elasticsearch (heavily tuned) 386x 343x n/a n/a Spark 2.1, 11 x m3.xlarge cluster w/ HDFS 485x 153x 119x 169x Presto, 10-node n1-standard-4 cluster 524x 189x 127x 61x Vertica, Intel Core i5 4670K 685x 607x 203x 132x Elasticsearch (lightly tuned) 1,642x 1,194x n/a n/a Presto, 5-node m3.xlarge cluster w/ HDFS 1,667x 735x 388x 159x Presto, 50-node m3.xlarge cluster w/ S3 2,048x 849x 164x 86x PostgreSQL 9.5 & cstore_fdw 7,238x 3,302x 1,424x 722x Spark 1.6, 5-node m3.xlarge cluster w/ S3 12,571x 5,906x 3,758x 1,884x *All speed comparisons are to the “MapD & 1 Nvidia Pascal DGX-1” benchmark Source: http://tech.marksblogg.com/benchmarks.html
  11. Confidential & Proprietary 14 Basic charts are frontend rendered using

    D3 and other related toolkits Scatterplots, pointmaps + polygons are backend rendered using the Iris Rendering Engine on GPUs Geo-Viz is composited over a frontend rendered basemap MapD Immerse: our hybrid approach
  12. Confidential & Proprietary 16 GOAI: End-to-end analytics on the GPU

    GPU Open Analytics Initiative – Fusing Machine Learning and GPU Analytics
  13. Confidential & Proprietary 17 GOAI: End-to-end analytics on the GPU

    GPU Open Analytics Initiative – Fusing Machine Learning and GPU Analytics MapD + pymapd + H2O (GPU) CPU ML Pipeline GPU ML Pipeline GPU ML Pipeline on GOAI Pointer in VRAM Python (CPU) Spark (CPU) CPU ML (CPU) CPU MapD (GPU) Python (GPU) ML (GPU)
  14. MapD Proprietary. Get involved with GOAI ➢ GOAI website: gpuopenanalytics.com/

    ➢ GitHub: github.com/gpuopenanalytics ➢ Discussion Group: groups.google.com/forum/#!forum/gpuopenanalytics
  15. “Confidential and property of H2O.ai. All rights reserved” Company Overview

    Founded 2011 Venture-backed, debuted in 2012 Products • H2O-3 • Sparkling Water • H2O4GPU • Driverless AI Mission Operationalize Data Science, and provide a platform for users to build beautiful data products Team 70 employees • Distributed Systems Engineers doing Machine Learning • Kaggle Winning Data Scientists • World-class visualization designers Headquarters Mountain View, CA
  16. CONFIDENTIAL 3 A.C. Nielsen
 A1 Telekom Austria
 AAPT
 Abovenet Communications


    Academic Administrative and Research Network
 Academic Computer Centre Cyfronet H
 Accelerated Data Works
 Accenture
 Accenture Services
 Ace Ina Holdings
 Ace International
 Ace Telecom
 Acton
 Acxiom Oration
 Adamo Telecom Iberia
 Administracion Nacional De Telecomunicaciones
 Admiral Objekt Waesche & Arbeitskleidung
 Adobe Systems
 Adobe Systems India
 Adsl Maroc Telecom
 Advanced Cable Communications
 Advanced Computer Solutions
 Affecto
 Afrihost-Dynamic
 Ainet Telekommunikations-Netzwerk Betriebs
 Air Bank A.S.
 Air Liquide Sa
 Airess Cesko
 Akamai Technologies
 Aktia Saastopankki Oy
 Aktiv-I Szolgaltato
 Al-Shahad Information Technology
 Albert Einstein College of Medicine of Yeshiva University
 Albert-Ludwigs-Universitaet Freiburg
 Alexander & Alexander Information Technology
 Algar Telecom
 Aliyun Computing
 Allbusiness.Com
 Allianz Maned Operations & Services Se
 Bell Canada
 Beltelecom
 Beyond The Network America
 Bezeq International-
 Bh - Tec
 Bharti Airtel
 Bibliotheque Nationale De France
 Big Fish Games
 Bigleaf Networks
 Biglobe
 Bilink
 Bimeh Dormitory Sharif University of Technology
 Bio-Rad Laboratories
 Biocontrol
 Bisiness Network Jv
 Bite Communications
 Biznet
 Biznet Metronet
 Blekinge Institute of Technology
 Blue Line Infotech
 Blueconnect
 Boingo Wireless
 Bol.Com Bv
 Boots UK Retail
 Boranet
 Borlange Energi
 Boston Scientific Oration
 Bouygues Telecom Division Mobile
 Bouygues Telecom Sa
 Brain Telecommunication
 Bright House Networks
 Brighthouse Networks Cfl Division
 Brighthouse Networks Indianapolis
 Bristish Petroleum
 British Sky Broadcasting
 Broadriver Communication
 Broadstripe
 Brutele Sc
 Bryant University Case Western Reserve University
 Catalina Marketing
 Catalina Marketing Oration
 Cect-Chinacomm Communications
 Cedars-Sinai Health Systems
 Celgene Oration
 Center For Governmental Research
 Centerbeam
 Central Telegraph Public Joint-Stock
 Centre De Calcul El-Khawarizmi - Cck
 Centre For Advanced Computing
 Centro De Tecnologia Da Informa O Renato Archer
 Ceom Israel
 Cerfnet
 Cerner Oration
 Certara USA
 Ceu
 Cgi Group
 Champaign Telephone
 Charles University
 Charlesbrauer
 Charter Communications
 Chegg
 Chengdu West Dimension Digital Technology
 Cheonanjeonhwakukjang
 Chico Board of Trade
 China Digital Kingdom Technology
 China Education and Research Network
 Chinatelecom Group Beijing Co
 Chongqing Times Newper Office
 Chs - Bna Lan
 Chunghwa Telecom Data Communication Business Group
 Cik Telecom
 Cisco
 Cisco Systems
 Cisco Systems Ironport Division
 Citadel Investment Group L.L.C.
 Citrix Systems
 City University Delft University of Technology Network
 Delhi Technical University(Dce)
 Deloitte
 Deloitte Services
 Deloitte Touche Tohmatsu Services
 Deloitte and Touch Regional Consulting Services
 Delphon Industries
 Delta Dental Plan of Michigan
 Delta Leasedline Network
 Deluxe Oration
 Den Networks
 Dena
 Deutsche Telekom
 Deutsches Reisebuero
 Develon
 Dhirubhai Ambani Institute of Information
 Dialog Axiata
 Digi Tavkozlesi Es Szolgaltato
 Digia
 Digital Entertainment
 Digital Hosting Technology
 Digital Network Associates - Franchisee
 Digital Ocean
 Digital Realm
 Digital River
 Digital-Entertainment-Industry-Development-Co--Zhongshan Zho
 Digitalocean Cloud
 Direct Supply
 Discoveries In Sight
 Dishnet Wireless
 Distributel Communications
 Disy Informationssysteme
 Diverge Consulting
 Dna Oy
 Doclernet
 Dongbeicaijingdaxue-Dl-Ln
 Doorway As
 Dotomi
 Drivetime
 Enbridge Pipelines
 Ency For Science Technology and Research
 End-User Numericable
 Energy Sciences Network
 Enom Orporated
 Ensync Business Solutions Pty
 Entanet International
 Enterprise Teaming
 Enzu
 Eotvos Lorand University of Sciences
 Epam Systems
 Epm Telecomunicaciones E.S.P.
 Epsilon Data Manement Dba
 Equant
 Equinox Consulting
 Erasmus Mc
 Erasmus University Rotterdam
 Ericsson Business Communications
 Ericsson Network Systems
 Escout Consulting
 Espn
 Estate Valuations and Pricing Systems
 Etapa Ep
 Etex Communications
 Etheric Networks
 Ethio Telecom
 Ethz Swiss Federal Institute of Technology Zurich
 Etisalat Lanka (Private)
 European Bioinformatics Institute
 Evergy
 Excell Media
 Exe2 Newton Abbot
 Exetel Act Dsl
 Exponential-E FPL Fibernet
 Facebook
 Fachhochschule Dortmund
 Fachhochschule Nordwestschweiz
 Faculty of Sciences University of Lisbon Companies Using H2O.ai 2015 2016 N ow 2017 G oal 14,000 10,281 6,427 3,810 H2O.ai Users 2015 2016 N ow 2017 G oal 140,000 97,620 54,163 38,257 10,000+ Companies use H2O — World Wide Community Adoption
  17. CONFIDENTIAL H2O.ai is a Visionary 
 in the Gartner Magic

    Quadrant
 for Data Science Platforms 4 “Overall customer satisfaction is very high.” “H2O is especially suited to IoT edge and device scenarios.” “H2O had the highest reference customer analytics support score of all the vendors.” “H2O.ai has significant adoption by large enterprises such as Macy’s, Comcast, and Capital One.” “H2O.ai is best known for developing open source, cluster-distributed ML algorithms at a time (2011) when big data demanded them, but no one else had them.” H2O.ai is a Strong Performer
 in the Forrester Predictive Analytics & Machine Learning H2O ranked #1 for Advanced Prototyping in Gartner Critical Capabilities for Data Science Publish: January 2017 “H2O received an outstanding score in machine learning, largely driven by its popular best-in-class machine-learning implementations” H2O.ai Strongly Positioned in Key Analyst Reports “It also scored very highly for flexibility, extensibility and openness, as well as for delivery.”
  18. CONFIDENTIAL AI in Financial Services 5 Wholesale / Commercial Banking

    • Know Your Customers (KYC) • Anti-Money Laundering (AML) Card/Payments Business • Transaction Frauds • Real-time Targeting • Credit Risk Scoring • In-Context Promotion Retail Banking • Deposit Fraud • Customer Churn Prediction • Auto-Loan IT Infrastructure • Security Cyberlake • DoS Detection and Protection • Master Data Management
  19. CONFIDENTIAL AI in Healthcare 6 Flu Season Prediction Personalized Drug

    Matching Medical Claim Fraud Detection Emergency Room and Hospital Management Drug Discovery Remote Patient Monitoring Early Cancer Detection / Oncology Medical Imaging and Diagnostics Product Recommendation
  20. “Confidential and property of H2O.ai. All rights reserved” Cloud Neutral

    Big Data Ecosystem H 2 O.ai Makes A Difference as an AI Platform Open Source Flexible Interface Interpretability GPU Enablement Rapid Model Deployment Smart and Fast Algorithms H 2 O Flow • Highly portable models deployed in Java (POJO) and Model Object Optimized (MOJO) • Automated and streamlined scoring service deployment with Rest API • Reason code for predictions • Variable importance for models • Visual Intelligence
  21. 9 H2O GPU Solutions H2O4GPU • 100% open source •

    Enterprise Support Algorithms • GLM • Gradient-boosted Tree • K-means • Random Forest Coming Soon • Linear SVM • PCA • Nearest Neighbors • Multi-GPU / multi-node Driverless AI • Enterprise License Solution functionality • Automatic Feature Engineering • Automated ML • Machine Learning Interpretability (MLI) • Automatic Visualization Coming Soon • Time Series / Text input • Model Stacking
  22. “ConfidenZal and property of H2O.ai. All rights reserved” Model Model

    Building Features Target Modeling Table Data Quality & Transformation Data Integration + Typical Enterprise Machine Learning Workflow
  23. “ConfidenZal and property of H2O.ai. All rights reserved” Model Model

    Building Features Target Modeling Table Data Quality & Transformation Data Integration + Driverless AI Typical Enterprise Machine Learning Workflow
  24. “ConfidenZal and property of H2O.ai. All rights reserved” Data VISUAL

    MODEL 
 INTERPRETATION AUTO DL Distributed Multi-GPU Model Repository Deploy MODEL FITNESS data.table Kaggle Grandmaster in a Box Automatic Feature Engineering Pipeline Export English Explanations Reason Codes K-Lime, LOCO, Partial Dependence Driverless AI Recipes H2O4GPU
  25. “ConfidenZal and property of H2O.ai. All rights reserved” Speed •

    GPU acceleration to achieve up to 40x speedups vs CPU • Multi GPU - XGBoost, GLM, K-Means and more • Achieve best performance in shortest time H2O4GPU
  26. “ConfidenZal and property of H2O.ai. All rights reserved” 0 0.1

    0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Allstate BNP Paribas Amazon Homesite Otto Group Relative error: Lower is Better Kaggle Grandmaster Best AutoDL GBM Baseline • Automatic feature engineering to increase accuracy - AlphaGo for AI • Automatic Kaggle Grandmaster recipes in a box for solving wide variety of use-cases • Automatic machine learning to find and tune the right ensemble of models Accuracy Preliminary results - untuned, single model Driverless AI
  27. “ConfidenZal and property of H2O.ai. All rights reserved” Interpretability •

    Interpretability for debugging, not just for regulators • Get reason codes and model interpretability in plain english • K-Lime, LOCO, partial dependence and more
  28. “Confidential and property of H2O.ai. All rights reserved” Automatic Visualization

    • Statistically sound and relevant visualizations generated automatically to highlight important characteristics in the data. • Automatic Scagnostics and other visualizations to generate the most relevant visualizations for each dataset • Can handle billions of rows using the aggregator function
  29. CONFIDENTIAL GPU OPEN ANALYTICS INITIATIVE github.com/gpuopenanalytics GPU Data Frame (GDF)

    Ingest/
 Parse Exploratory Analysis Feature Engineering ML/DL Algorithms Grid Search Scoring Model
 Export
  30. FEATURE GTC Q2 2017 Q3 2017 Q4 2017 GLM (Single

    GPU) Python API for training & scoring Python .whl installation file GBM (Single GPU) Inference on GPU (GLM) GLM (Multi GPU - Multiple Models in Parallel) GBM (Multi GPU - Multiple Models in Parallel) Inference on GPU (GBM) k-Means Clustering, Nearest Neighbors - Single GPU PCA (Single GPU) Quantiles (Single GPU) GLM (Multi GPU - Single Model) Sort (Single GPU) GBM (Multi GPU - Single Model) Random Forest (Multi GPU - Single Model) Kalman Filters Sort, Quantiles (Multi GPU) SVM (Single GPU) Connectors to GPU Open AI Data Frames H2O4GPU Roadmap
  31. “Confidential and property of H2O.ai. All rights reserved” Links &

    Resources Driverless AI Webinar Interpretability Webinar Josh Patterson Blog Andy Steinbach Blog