Jupyter Popup: Accelerating the Machine Learnin...

March 21, 2018

69

Jupyter Popup: Accelerating the Machine Learning Pipeline on Very Large Datasets with the GPU Data Frame

Use of the humble GPU has spiked over the past couple years as machine learning and data analytics workloads have been optimized to take advantage of the GPU’s parallelism and memory bandwidth. Even though these operations (the steps of the Machine Learning Pipeline) could all be run on the same GPUs, they were typically isolated, and much slower than they needed to be, because data was serialized and deserialized between the steps over PCIe.

That inefficiency was recently addressed by the formation of the GPU Open Analytics Initiative (GOAI http://gpuopenanalytics.com/), an industry standard founded by MapD, H2O.ai and Anaconda. This group created the GPU data frame (GDF), based on Apache Arrow, for passing data between processes and keeping it all in the GPU.

In this talk, Aaron will explain how the GDF technology works, show how it is enabling a diverse set of GPU workloads, and demonstrate how to use a Jupyter Notebook to take advantage of it. We’ll demonstrate on a very large dataset how to manage a full Machine Learning Pipeline with minimal data exchange overhead between MapD’s SQL engine and H2O’s generalized linear model library (GLM).

https://www.eventbrite.com/e/jupyter-pop-up-tickets-42550005211

OmniSci

March 21, 2018

Tweet

More Decks by OmniSci

See All by OmniSci

Effortless Analytics & Data Visualization with OmniSci

0

76

Speed Meets Scale: Massively Accelerated Analytics & Data Science

0

65

Using GPU-acceleration to Interact with Open Street Map at Planet-Scale

0

130

Speed Meets Scale: Interactively Analyzing & Visualizing Billions of Rows of Spatiotemporal Data

0

73

Speed Meets Scale: Massively Accelerated Analytics

0

73

OmniSci Converge Community Day

0

300

OmniSci 101: Accelerating the Data Science Workflow Workshop

0

49

Creating Custom Visualizations and Applications Using OmniSci Workshop

0

130

Speed Meets Scale: Interactively Analysing and Visualising Billions of Rows with GPU-powered Analytics

0

57

Featured

See All Featured

Principles of Awesome APIs and How to Build Them.

126

17k

Learning to Love Humans: Emotional Interface Design

273

40k

Java REST API Framework Comparison - PWX 2021

31

8.6k

The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024

26

2.9k

Helping Users Find Their Own Way: Creating Modern Search Experiences

29

2.7k

Large-scale JavaScript Application Architecture

512

110k

Sharpening the Axe: The Primacy of Toolmaking

44

2.4k

Into the Great Unknown - MozCon

39

1.9k

455

42k

How STYLIGHT went responsive

100

5.6k

Optimising Largest Contentful Paint

37

3.3k

The Pragmatic Product Professional

35

6.7k

Transcript

None
None
has
MapD is the analytics platform created for GPUs
DEMO TIME
Advanced memory management SSD or NVRAM STORAGE (L3) 250GB to
20TB 1-2 GB/sec CPU RAM (L2) 32GB to 3TB 70-120 GB/sec GPU RAM (L1) 24GB to 256GB 1000-6000 GB/sec Hot Data Speedup = 1500x to 5000x Over Cold Data Warm Data Speedup = 35x to 120x Over Cold Data Cold Data COMPUTE LAYER STORAGE LAYER Data Lake/Data Warehouse/System Of Record
The GPU Open Analytics Initiative (GOAI)
Machine Learning Pipeline
• • ML Examples
• • • Next Steps
Thank you! Any questions?