Upgrade to Pro — share decks privately, control downloads, hide ads and more …

OpenTalks.AI - Дмитрий Конягин, AI сегодня и завтра

OpenTalks.AI
February 14, 2019

OpenTalks.AI - Дмитрий Конягин, AI сегодня и завтра

OpenTalks.AI

February 14, 2019
Tweet

More Decks by OpenTalks.AI

Other Decks in Science

Transcript

  1. FORCES SHAPING COMPUTING 1980 1990 2000 2010 2020 103 105

    107 BEYOND MOORE’S LAW CPU PERFORMANCE
  2. FORCES SHAPING COMPUTING 1980 1990 2000 2010 2020 103 105

    107 GPU PERFORMANCE CPU PERFORMANCE + BEYOND MOORE’S LAW — 1000X EVERY 10 YEARS ACCELERATED COMPUTING
  3. FORCES SHAPING COMPUTING 1980 1990 2000 2010 2020 103 105

    107 GPU PERFORMANCE CPU PERFORMANCE + BEYOND MOORE’S LAW — 1000X EVERY 10 YEARS ACCELERATED COMPUTING COMPUTERS WRITING SOFTWARE DATA DEEP NEURAL NETWORK PROGRAM
  4. 5 50% Reduction in Emergency Road Repair Costs >$6M /

    Year Savings and Reduced Risk of Outage INFRASTRUCTURE HEALTHCARE IOT AI TRANSFORMING EVERY INDUSTRY >80% Accuracy & Immediate Alert to Radiologists
  5. 6 20% Higher Accuracy For Better User Experience 50% Improvement

    in User Engagement SPEECH / VOICE IMAGES / VIDEO RECOMMENDATIONS AI IS DRIVING CUSTOMER ENGAGEMENT 60X Better Latency for Real-Time User Experience
  6. 7 Bigger and More Compute Intensive NEURAL NETWORK COMPLEXITY IS

    EXPLODING 2013 2014 2015 2016 2017 2018 Speech (GOP * Bandwidth) DeepSpeech DeepSpeech 2 DeepSpeech 3 30X 2011 2012 2013 2014 2015 2016 2017 Image (GOP * Bandwidth) ResNet-50 Inception-v2 Inception-v4 AlexNet GoogleNet 350X 2014 2015 2016 2017 2018 Translation (GOP * Bandwidth) MoE OpenNMT GNMT 10X
  7. 8 Convolutional Networks ReLu Encoder/Decoder BatchNorm Dropout Pooling Concat Recurrent

    Networks GRU LSTM CTC Beam Search WaveNet Attention Generative Adversarial Networks Speech Enhancement GAN Coupled GAN Conditional GAN MedGAN 3D-GAN Reinforcement Learning DQN Simulation DDPG New Species Neural Collaborative Filtering Mixture of Experts Block Sparse LSTM Capsule Nets CAMBRIAN EXPLOSION
  8. 9 Training Device GPU DEEP LEARNING IS A NEW COMPUTING

    MODEL Training Billions of Trillions of Operations GPU train larger models, accelerate time to market Inference Datacenter infererence 10s of billions of image, voice, video queries per day GPU inference for fast response, maximize data center throughput
  9. 10 PLATFORM BUILT FOR AI Accelerating Every Framework And Fueling

    Innovation All Major Frameworks All Use-cases Speech Video Translation Personalization Volta Tensor Core, NVSwitch, NVLink Tensor Cores NVLink NVSwitch
  10. 11 GPU ENABLES DRAMATIC REDUCTION IN TIME TO TRAIN 0

    20 40 60 80 100 120 140 2x CPU Single Node 1X P100 Single Node 1X V100 DGX-1 8x V100 At scale 2176x V100 Relative Time to Train Improvements (ResNet-50) ResNet-50, 90 epochs to solution | CPU Server: dual socket Intel Xeon Gold 6140 Sony 2176x V100 record on https://nnabla.org/paper/imagenet_in_224sec.pdf <4 Minutes 3.3 Hours 25 Days 30 Hours 4.8 Days
  11. 12 >2X IN 1.5 YEARS WITH NVIDIA AI SOFTWARE Resnet-50

    Time to Train 0 2 4 6 8 10 May '18 Sep '17 May '17 2.3X DGX-1 Tesla V100-SXM2-16GB | Mixed Precision using Tensor Cores May-17 | Initial Release| Caffe2 Sep-17 (1st public Container) | 17.09 Container | MXnet | BS=128 Sep-18 (Current) | 18.09-py3 Container | MXnet | BS=128
  12. 13 LIVE VIDEO SPEECH AI INFERENCE IS EXPLODING RECOMMENDATIONS 1

    Billion Voice Searches Per Day Google, Bing, etc. 1 Billion Videos Watched Per Day Facebook 1 Trillion Ads/Rankings Per Day Impressions
  13. 14 TENSORRT HYPERSCALE INFERENCE PLATFORM TENSORRT INFERENCE SERVER WORLD’S MOST

    ADVANCED SCALE- OUT GPU INTEGRATED INTO TENSORFLOW & ONNX SUPPORT
  14. 15 NVIDIA TESLA PLATFORM SAVES MONEY Game-Changing Inference Performance 200

    CPU Servers 60 KWatts INFERENCE WORKLOAD: Speech, NLP and Video 1 T4-Accelerated Server 2 KWatts INFERENCE WORKLOAD: Speech, NLP and Video CPU-Only GPU-Accelerated 5 Racks in a Box
  15. 16 BIG DATA INDUSTRY VERTICALS From Business Intelligence to Data

    Science Consumer Internet Retail Financial Services Healthcare
  16. 17 THE BIG PROBLEM IN DATA SCIENCE All Data ETL

    Manage Data Structured Data Store Data Preparation Training Model Training Visualization Evaluate Inference Deploy Slow Training Times for Data Scientists
  17. 18 RAPIDS — OPEN GPU DATA SCIENCE Software Stack Data

    Preparation Visualization Model Training CUDA PYTHON APACHE ARROW DASK DEEP LEARNING FRAMEWORKS CUDNN RAPIDS CUML CUDF CUGRAPH http://www.rapids.ai/
  18. DEPLOYING RAPIDS — FASTER SPEEDS, REAL WORLD BENEFITS ML NVIDIA

    DGX-2 0 1800 3600 5400 7200 DGX-2 100 CPU Nodes 50 CPU Nodes 20 CPU Nodes 0 600 1200 1800 2400 3000 3600 DGX-2 100 CPU Nodes 50 CPU Nodes 20 CPU Nodes 1 Hour SECONDS ETL 2 Hours SECONDS
  19. 20 DRAMATICALLY MORE FOR YOUR MONEY 300 Self-hosted Broadwell CPU

    Servers 180 KWatts Machine Learning: XGBoost 1 DGX-2 10 KWatts Machine Learning: XGBoost GPU-Accelerated CPU-Only Cluster SAME THROUGHPUT 1/8 THE COST 1/18 THE POWER 1/30 THE SPACE
  20. 21 CHALLENGES WITH COMPLEX SOFTWARE Current DIY GPU-accelerated AI and

    HPC deployments are complex and time consuming to build, test and maintain Development of software frameworks by the community is moving very fast Requires high level of expertise to manage driver, library, framework dependencies NVIDIA Libraries NVIDIA Container Runtime NVIDIA Driver NVIDIA GPU Applications or Frameworks
  21. 22 NGC Accelerated Stacks for AI, Machine Learning, and HPC

    Innovate In Minutes, Not Weeks Run Anywhere Comprehensive Library of GPU-Accelerated Containers http://www.nvidia.com/ngc
  22. 24 ACTUAL VIDEO AI RENDERED VIDEO SATURNV WORKLOADS: CREATING OPPORTUNITIES,

    SOLVING CHALLENGES NGC Accelerated Stacks for AI, Machine Learning, and HPC Research & Development Catapulting graphics industry into the AI chapter Autonomous Cars Super-real-time simulation for self-driving development Robotics Simulation of the real world to train robots RTX Graphics Real-time ray tracing and AI for creative applications
  23. 25 NVIDIA DGX SATURNV 1500 DGX Nodes 12,600 GPUs 1.5

    ExaFLOPs 4MW Avg. Power The World’s Largest Enterprise AI Infrastructure Buildout
  24. 26 DGX POD: INNOVATIONS & KEY METRICS DISTILLED Design Best

    Practices Lessons learned from building the worlds largest AI Infrastructure Reference Architectures Partner Reference Architectures, Scale up racks Product Innovation & Quality Rapid exploration & resolution of customer issues
  25. 27 MLPERF: BEST INFRASTRUCTURE FOR BEST RESULTS • DGX is

    the foundation of SATURNV • Platform for NVIDIA’s MLPerf submission • Composed of DGX-1 and DGX-2 nodes, integrated rapidly (DGX POD) • Top60 class supercomputing cluster built in under 3 weeks • Differentiated architecture, breakthrough performance Our Latest Proof Point of Infrastructure Design Leadership
  26. 28 Training developers, data scientists, and researchers how to solve

    the world’s most challenging problems http://www.nvidia.com/dli