OpenTalks.AI

AI TODAY AND TOMORROW Dmitry Konyagin | [email protected]

FORCES SHAPING COMPUTING 1980 1990 2000 2010 2020 103 105
107 BEYOND MOORE’S LAW CPU PERFORMANCE

107 GPU PERFORMANCE CPU PERFORMANCE + BEYOND MOORE’S LAW — 1000X EVERY 10 YEARS ACCELERATED COMPUTING

107 GPU PERFORMANCE CPU PERFORMANCE + BEYOND MOORE’S LAW — 1000X EVERY 10 YEARS ACCELERATED COMPUTING COMPUTERS WRITING SOFTWARE DATA DEEP NEURAL NETWORK PROGRAM

5 50% Reduction in Emergency Road Repair Costs >$6M /
Year Savings and Reduced Risk of Outage INFRASTRUCTURE HEALTHCARE IOT AI TRANSFORMING EVERY INDUSTRY >80% Accuracy & Immediate Alert to Radiologists

6 20% Higher Accuracy For Better User Experience 50% Improvement
in User Engagement SPEECH / VOICE IMAGES / VIDEO RECOMMENDATIONS AI IS DRIVING CUSTOMER ENGAGEMENT 60X Better Latency for Real-Time User Experience

7 Bigger and More Compute Intensive NEURAL NETWORK COMPLEXITY IS
EXPLODING 2013 2014 2015 2016 2017 2018 Speech (GOP * Bandwidth) DeepSpeech DeepSpeech 2 DeepSpeech 3 30X 2011 2012 2013 2014 2015 2016 2017 Image (GOP * Bandwidth) ResNet-50 Inception-v2 Inception-v4 AlexNet GoogleNet 350X 2014 2015 2016 2017 2018 Translation (GOP * Bandwidth) MoE OpenNMT GNMT 10X

8 Convolutional Networks ReLu Encoder/Decoder BatchNorm Dropout Pooling Concat Recurrent
Networks GRU LSTM CTC Beam Search WaveNet Attention Generative Adversarial Networks Speech Enhancement GAN Coupled GAN Conditional GAN MedGAN 3D-GAN Reinforcement Learning DQN Simulation DDPG New Species Neural Collaborative Filtering Mixture of Experts Block Sparse LSTM Capsule Nets CAMBRIAN EXPLOSION

9 Training Device GPU DEEP LEARNING IS A NEW COMPUTING
MODEL Training Billions of Trillions of Operations GPU train larger models, accelerate time to market Inference Datacenter infererence 10s of billions of image, voice, video queries per day GPU inference for fast response, maximize data center throughput

10 PLATFORM BUILT FOR AI Accelerating Every Framework And Fueling
Innovation All Major Frameworks All Use-cases Speech Video Translation Personalization Volta Tensor Core, NVSwitch, NVLink Tensor Cores NVLink NVSwitch

11 GPU ENABLES DRAMATIC REDUCTION IN TIME TO TRAIN 0
20 40 60 80 100 120 140 2x CPU Single Node 1X P100 Single Node 1X V100 DGX-1 8x V100 At scale 2176x V100 Relative Time to Train Improvements (ResNet-50) ResNet-50, 90 epochs to solution | CPU Server: dual socket Intel Xeon Gold 6140 Sony 2176x V100 record on https://nnabla.org/paper/imagenet_in_224sec.pdf <4 Minutes 3.3 Hours 25 Days 30 Hours 4.8 Days

13 LIVE VIDEO SPEECH AI INFERENCE IS EXPLODING RECOMMENDATIONS 1
Billion Voice Searches Per Day Google, Bing, etc. 1 Billion Videos Watched Per Day Facebook 1 Trillion Ads/Rankings Per Day Impressions

14 TENSORRT HYPERSCALE INFERENCE PLATFORM TENSORRT INFERENCE SERVER WORLD’S MOST
ADVANCED SCALE- OUT GPU INTEGRATED INTO TENSORFLOW & ONNX SUPPORT

15 NVIDIA TESLA PLATFORM SAVES MONEY Game-Changing Inference Performance 200
CPU Servers 60 KWatts INFERENCE WORKLOAD: Speech, NLP and Video 1 T4-Accelerated Server 2 KWatts INFERENCE WORKLOAD: Speech, NLP and Video CPU-Only GPU-Accelerated 5 Racks in a Box

16 BIG DATA INDUSTRY VERTICALS From Business Intelligence to Data
Science Consumer Internet Retail Financial Services Healthcare

17 THE BIG PROBLEM IN DATA SCIENCE All Data ETL
Manage Data Structured Data Store Data Preparation Training Model Training Visualization Evaluate Inference Deploy Slow Training Times for Data Scientists

18 RAPIDS — OPEN GPU DATA SCIENCE Software Stack Data
Preparation Visualization Model Training CUDA PYTHON APACHE ARROW DASK DEEP LEARNING FRAMEWORKS CUDNN RAPIDS CUML CUDF CUGRAPH http://www.rapids.ai/

DEPLOYING RAPIDS — FASTER SPEEDS, REAL WORLD BENEFITS ML NVIDIA
DGX-2 0 1800 3600 5400 7200 DGX-2 100 CPU Nodes 50 CPU Nodes 20 CPU Nodes 0 600 1200 1800 2400 3000 3600 DGX-2 100 CPU Nodes 50 CPU Nodes 20 CPU Nodes 1 Hour SECONDS ETL 2 Hours SECONDS

20 DRAMATICALLY MORE FOR YOUR MONEY 300 Self-hosted Broadwell CPU
Servers 180 KWatts Machine Learning: XGBoost 1 DGX-2 10 KWatts Machine Learning: XGBoost GPU-Accelerated CPU-Only Cluster SAME THROUGHPUT 1/8 THE COST 1/18 THE POWER 1/30 THE SPACE

21 CHALLENGES WITH COMPLEX SOFTWARE Current DIY GPU-accelerated AI and
HPC deployments are complex and time consuming to build, test and maintain Development of software frameworks by the community is moving very fast Requires high level of expertise to manage driver, library, framework dependencies NVIDIA Libraries NVIDIA Container Runtime NVIDIA Driver NVIDIA GPU Applications or Frameworks

22 NGC Accelerated Stacks for AI, Machine Learning, and HPC
Innovate In Minutes, Not Weeks Run Anywhere Comprehensive Library of GPU-Accelerated Containers http://www.nvidia.com/ngc

23 A CONSISTENT, HYBRID CLOUD EXPERIENCE ACROSS COMPUTE PLATFORMS

24 ACTUAL VIDEO AI RENDERED VIDEO SATURNV WORKLOADS: CREATING OPPORTUNITIES,
SOLVING CHALLENGES NGC Accelerated Stacks for AI, Machine Learning, and HPC Research & Development Catapulting graphics industry into the AI chapter Autonomous Cars Super-real-time simulation for self-driving development Robotics Simulation of the real world to train robots RTX Graphics Real-time ray tracing and AI for creative applications

25 NVIDIA DGX SATURNV 1500 DGX Nodes 12,600 GPUs 1.5
ExaFLOPs 4MW Avg. Power The World’s Largest Enterprise AI Infrastructure Buildout

26 DGX POD: INNOVATIONS & KEY METRICS DISTILLED Design Best
Practices Lessons learned from building the worlds largest AI Infrastructure Reference Architectures Partner Reference Architectures, Scale up racks Product Innovation & Quality Rapid exploration & resolution of customer issues

27 MLPERF: BEST INFRASTRUCTURE FOR BEST RESULTS • DGX is
the foundation of SATURNV • Platform for NVIDIA’s MLPerf submission • Composed of DGX-1 and DGX-2 nodes, integrated rapidly (DGX POD) • Top60 class supercomputing cluster built in under 3 weeks • Differentiated architecture, breakthrough performance Our Latest Proof Point of Infrastructure Design Leadership

28 Training developers, data scientists, and researchers how to solve
the world’s most challenging problems http://www.nvidia.com/dli

OpenTalks.AI - Дмитрий Конягин, AI сегодня и за...

OpenTalks.AI - Дмитрий Конягин, AI сегодня и завтра

More Decks by OpenTalks.AI

Other Decks in Science

Featured

Transcript

AI TODAY AND TOMORROW Dmitry Konyagin | [email protected]

FORCES SHAPING COMPUTING 1980 1990 2000 2010 2020 103 105

FORCES SHAPING COMPUTING 1980 1990 2000 2010 2020 103 105

FORCES SHAPING COMPUTING 1980 1990 2000 2010 2020 103 105

5 50% Reduction in Emergency Road Repair Costs >$6M /

6 20% Higher Accuracy For Better User Experience 50% Improvement

7 Bigger and More Compute Intensive NEURAL NETWORK COMPLEXITY IS

8 Convolutional Networks ReLu Encoder/Decoder BatchNorm Dropout Pooling Concat Recurrent

9 Training Device GPU DEEP LEARNING IS A NEW COMPUTING

10 PLATFORM BUILT FOR AI Accelerating Every Framework And Fueling

11 GPU ENABLES DRAMATIC REDUCTION IN TIME TO TRAIN 0

12 >2X IN 1.5 YEARS WITH NVIDIA AI SOFTWARE Resnet-50

13 LIVE VIDEO SPEECH AI INFERENCE IS EXPLODING RECOMMENDATIONS 1

14 TENSORRT HYPERSCALE INFERENCE PLATFORM TENSORRT INFERENCE SERVER WORLD’S MOST

15 NVIDIA TESLA PLATFORM SAVES MONEY Game-Changing Inference Performance 200

16 BIG DATA INDUSTRY VERTICALS From Business Intelligence to Data

17 THE BIG PROBLEM IN DATA SCIENCE All Data ETL

18 RAPIDS — OPEN GPU DATA SCIENCE Software Stack Data

DEPLOYING RAPIDS — FASTER SPEEDS, REAL WORLD BENEFITS ML NVIDIA

20 DRAMATICALLY MORE FOR YOUR MONEY 300 Self-hosted Broadwell CPU

21 CHALLENGES WITH COMPLEX SOFTWARE Current DIY GPU-accelerated AI and

22 NGC Accelerated Stacks for AI, Machine Learning, and HPC

23 A CONSISTENT, HYBRID CLOUD EXPERIENCE ACROSS COMPUTE PLATFORMS

24 ACTUAL VIDEO AI RENDERED VIDEO SATURNV WORKLOADS: CREATING OPPORTUNITIES,

25 NVIDIA DGX SATURNV 1500 DGX Nodes 12,600 GPUs 1.5

26 DGX POD: INNOVATIONS & KEY METRICS DISTILLED Design Best

27 MLPERF: BEST INFRASTRUCTURE FOR BEST RESULTS • DGX is

28 Training developers, data scientists, and researchers how to solve