Deep Learning at Scale

Hi, I’m Kern. @kern @KernCanCode

deep learning?!

AlphaGo achieves honorary 9 dan rank.

Google’s DeepDream or: scary ﬁsh-dog things everywhere

deep learning is a kind of machine learning.

machine learning is like pattern recognition. (kind of)

deep learning has been applied to images, text, audio, video,
and games with great success!

DEEP  CONVOLUTIONAL  NEURAL  NETWORKS

14.2 million images

Challenge Winners top-5 classiﬁcation error (lower is better) 0 0.075
0.15 0.225 0.3 0.036 0.05 0.074 0.112 0.153 0.258 0.282 2010 2011 2012 2014 2013 Human 2015

0 0.075 0.15 0.225 0.3 0.036 0.05 0.074 0.112 0.153
0.258 0.282 2010 2011 2012 2014 2013 Human 2015 Challenge Winners top-5 classiﬁcation error (lower is better) }much wow AlexNet BEFORE DEEP LEARNING AFTER DEEP LEARNING

how does deep learning work?

it’s easy! lol just kidding

it’s calculus, but you don’t need to know much.

label score dog 97.1% shiba inu 2.5% meme 0.4% neural
networks are classiﬁers

supervised learning inferring a function from labeled training data Wikipedia

you provide “ground truth” examples and labels.

perceptrons input output weights

is this a dog? T F features classes weights

PERCEPTRONS ARE TOO LINEAR WE MUST GO DEEPER

deep convolutional neural networks aﬀectionately known as “convnets” input output
convolution, activation, & pooling layers fully-connected layers (perceptrons)

how are the optimal weights determined?

stochastic gradient descent ﬁnd the minimum error

training takes a while. anywhere from 8hrs to 2wks

how do you scale deep learning?

data collection & cleaning more clean data, the better 1

2 model training & selection anywhere from 8hrs to 2wks

2 model training & selection anywhere from 8hrs to 2wks 3 serving in production with real-time or batch requests

∞ rinse & repeat keep models fresh with new data 2 model training & selection anywhere from 8hrs to 2wks 3 serving in production with real-time or batch requests

you’re gonna need GPUs for NVIDIA CUDA & cuDNN

NVIDIA GeForce GTX TITAN X is the gold standard

AWS G2 instances mid-tier on-demand GPUs Model GPUs vCPU Mem
(GiB) SSD Storage (GB) g2.2xlarge 1 8 15 1 x 60 g2.8xlarge 4 32 60 2 x 120

AWS G2 GPUs are also great for model serving.

use HDFS + Spark for data storage and processing

TensorFlow with Keras is a good choice.

from keras.models import Sequential from keras.layers import Dense, Activation model
= Sequential() model.add(Dense(output_dim=64, input_dim=100)) model.add(Activation("relu")) model.add(Dense(output_dim=10)) model.add(Activation("softmax")) model.compile( loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy']) model.fit(X_train, Y_train, nb_epoch=5, batch_size=32)

distributed training with CaﬀeOnSpark https://github.com/yahoo/CaﬀeOnSpark

it’s often better to peer high-end GPUs on a single
machine.

parallelize training on hyperparameters. (e.g. learning rate, momentum)

hyperparamter optimization with MOE, hyperopt, or custom scripts hyperopt MOE
custom scripts

wrapping up

get your feet wet TensorFlow MNIST Walkthrough bit.ly/pavlovtensor Andrej Karpathy’s
CS231n bit.ly/pavlov231

suggested technologies • Neural Network Libraries • Caﬀe & CaﬀeOnSpark
• TensorFlow • Torch • Keras • Hyperparameter Optimization • MOE • hyperopt • Spearmint • Infrastructure and Hardware • Apache Spark & HDFS • NVIDIA CUDA • Amazon Web Services G2 instances such scale much wow

references • icons by John Caserta, Liau Jian Jie, Garrett
Knoll, Luboš Volkov, Noe Araujo from the Noun Project • images from Andrej Karpathy • Alex Kern, co-founder & CTO of • we help you structure image & video w/ deep learning • @KernCanCode on Twitter • @kern on GitHub about me • deep learning is great for many kinds of media • you can scale a deep learning system on Spark & AWS • get started @ bit.ly/pavlovtensor & bit.ly/pavlov231 in summary

Deep Learning at Scale

Deep Learning at Scale

Other Decks in Programming

Featured

Transcript