Deep Learning with Apache Spark and DL4J

Slide 1

Slide 1 text

Deep Learning with Spark Dr Kashif Rasul1 and Shoaib Burq2 Zürich ! Apache Spark Meetup 25.04.2017 2 http://geograﬁa.com.au, Twitter: @sabman 1 https://research.zalando.com/, Twitter: @krasul 1/33

Slide 2

Slide 2 text

Agenda • Introduction to deep learning • DeepLearning4J • Distributed training • Prototyping in Python • Summary 2/33

Slide 3

Slide 3 text

Deep learning (DL) • Subﬁeld of machine learning • Concerned with learning increasingly meaningful representations • Modern methods involve tens or even hundreds of successive layers of representation • All learned from exposure to lots of training data 3/33

Slide 4

Slide 4 text

Data driven approach • Problem: mapping images e.g. ! to label "cat" • Data driven approach consists of: 1. Score (or prediction): our deep learning model 2. Loss: a measure of our model's performance 3. Optimization: a way to change our model to minimize the loss 4/33

Slide 5

Slide 5 text

Linear case • Data: , s consisting of distinct labels • Score: • Loss: • Optimization: change in the direction of to ﬁnd the optimal 5/33

Slide 6

Slide 6 text

6/33 https://www.manning.com/books/deep-learning-with-python

Slide 7

Slide 7 text

7/33 https://www.manning.com/books/deep-learning-with-python

Slide 8

Slide 8 text

8/33 https://www.manning.com/books/deep-learning-with-python

Slide 9

Slide 9 text

DL hype? • Offers better performance on many problems, especially for computer vision, audio and text tasks • Automates "feature engineering" • Advances in: 1. hardware 2. datasets and benchmarks 3. algorithms 9/33

Slide 10

Slide 10 text

DL frameworks • Collections of many types of layers • Composition API via a computational graph (values or tensors ﬂow from source to the end) • Automatic differentiation of each node to implement backpropagation • APIs to run the optimization on a predeﬁned model or graph with training data and labels 10/33

Slide 11

Slide 11 text

CIFAR-10 • 32x32 pixel RGB images • 10 classes: ✈, ", #, $, %, &, ', (, ), and * • 50,000 training images • 10,000 test images 11/33

Slide 12

Slide 12 text

Convolutional Networks (ConvNets) • Convolutional layers arranged in 3 dimension: width, height, depth • The neurons in a layer will only be connected to a small region of the layer before it • ConvNets transform a 3D volume to another 3D volume 12/33

Slide 13

Slide 13 text

Intuition • Convolutional layer's weights consist of small learnable filters • The filter is small spatially, but extends through the full depth of input volume • We slide across input volume producing a 2-dim activation map of that filter • As we slide the filter: we are computing the dot product between the filter and the input 13/33

Slide 14

Slide 14 text

• Want to learn filters that activate when they see some specific type of feature at some spatial position in the input • Stacking these maps for all 5x5x3 filters (6 for this layer) along the depth forms the full output volume • This process is differentiable (it's also a convolution) 14/33

Slide 15

Slide 15 text

15/33

Slide 16

Slide 16 text

DeepLearning4J (DL4J) • Java based DL framework • Multi-GPU (NVIDIA) support • Using spark: to parallelize via "data parallelism" • Import Keras models • Helper libraries and sample code on github 16/33

Slide 17

Slide 17 text

cifarTrain = new CifarDataSetIterator(batchSize,...); cifarTest = ... MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder() .seed(seed) ... // score, loss and optimization configuration here MultiLayerNetwork model = new MultiLayerNetwork(conf); model.init(); for( int i=0; i

Slide 18

Slide 18 text

MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder() .seed(seed) .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT) .iterations(1) .activation(Activation.LEAKYRELU) .weightInit(WeightInit.XAVIER) .learningRate(0.02) .updater(Updater.NESTEROVS).momentum(0.9) .regularization(true).l2(1e-4) .list() .layer(0, new DenseLayer.Builder().nIn(32 * 32 * 3).nOut(500).build()) .layer(1, new DenseLayer.Builder().nIn(500).nOut(100).build()) .layer(2, new OutputLayer.Builder( LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD) .activation(Activation.SOFTMAX).nIn(100).nOut(10).build()) .pretrain(false).backprop(true) .build(); 18/33

Slide 19

Slide 19 text

... .layer(1, new ConvolutionLayer.Builder(3, 3) .nIn(channels) .padding(1, 1) .nOut(64) .weightInit(WeightInit.RELU) .activation(Activation.LEAKYRELU) .build()) .layer(2, new SubsamplingLayer.Builder( SubsamplingLayer.PoolingType.MAX) .kernelSize(2, 2) .build()) .layer(3, new ConvolutionLayer.Builder(3, 3)...) ... 19/33

Slide 20

Slide 20 text

Stochastic gradient descent (SGD) • Vanilla optimization: update the weights with respect to all the data • Vanilla SGD: iteratively update the weights with respect to a small random batch of data (batchSize) • After an update has seen all the data we mark it as an epoch • Fancier SGD methods use momentum terms etc. • Sequential process 20/33

Slide 21

Slide 21 text

21/33 https://research.google.com/archive/large_deep_networks_nips2012.pdf

Slide 22

Slide 22 text

SparkConf sparkConf = new SparkConf(); JavaSparkContext sc = new JavaSparkContext(sparkConf); cifarTrain = new CifarDataSetIterator(batchSizePerWorker,...); List trainDataList = new ArrayList<>(); while (cifarTrain.hasNext()) { trainDataList.add(cifarTrain.next()); } JavaRDD trainData = sc.parallelize(trainDataList); 22/33

Slide 23

Slide 23 text

MultiLayerConfiguration conf = new NeuralNetConfiguration .Builder() ... TrainingMaster tm = new ParameterAveragingTrainingMaster .Builder(batchSizePerWorker) .averagingFrequency(5) .workerPrefetchNumBatches(2) .batchSizePerWorker(batchSizePerWorker) .build(); SparkDl4jMultiLayer sparkNet = new SparkDl4jMultiLayer(sc, conf, tm); for( int i=0; i

Slide 24

Slide 24 text

CudaEnvironment.getInstance().getConfiguration() .allowMultiGPU(true) .setMaximumDeviceCache(2L * 1024L * 1024L * 1024L) .allowCrossDeviceAccess(true); MultiLayerConfiguration conf = new NeuralNetConfiguration... MultiLayerNetwork model = new MultiLayerNetwork(conf); ParallelWrapper wrapper = new ParallelWrapper.Builder(model) .prefetchBuffer(24).workers(4) .averagingFrequency(3).useLegacyAveraging(true) .build(); for( int i=0; i

Slide 25

Slide 25 text

Keras • High level API based on python • Backend: TensorFlow or Theano • Allows for easy and fast prototyping • Models are described in Python code 25/33

Slide 26

Slide 26 text

inputs = Input(shape=(784,)) x = Dense(64, activation='relu')(inputs) x = Dense(64, activation='relu')(x) scores = Dense(10, activation='softmax')(x) model = Model(inputs=inputs, outputs=scores) model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy']) model.fit(data, labels) 26/33

Slide 27

Slide 27 text

# creates a HDF5 file 'my_model.h5' model.save('my_model.h5') model = load_model('my_model.h5') # model reconstruction from JSON: json_string = model.to_json() model = model_from_json(json_string) # save model weights model.save_weights('my_model_weights.h5') model.load_weights('my_model_weights.h5') 27/33

Slide 28

Slide 28 text

// configuration only CopyMultiLayerNetworkConfiguration modelConfig = KerasModelImport.importKerasSequentialConfiguration ("PATH TO YOUR JSON FILE", enforceTrainingConfig); // configuration and weights MultiLayerNetwork network = KerasModelImport.importKerasSequentialModelAndWeights ("PATH TO YOUR HDF5 FILE", enforceTrainingConfig); 28/33

Slide 29

Slide 29 text

29/33 https://deepsense.io/deep-learning-for-satellite-imagery-via-image-segmentation/

Slide 30

Slide 30 text

30/33 https://deepsense.io/deep-learning-for-satellite-imagery-via-image-segmentation/

Slide 31

Slide 31 text

31/33

Slide 32

Slide 32 text

Summary • DL: learning successive "layers" of representations • Data driven approach: three parts • Frameworks: collection of layers and a computational graph • ConvNets: transform 3D volumes to 3D volumes • DL4J: implements both types of parallelism (data and model) • Suggestion: prototype in Keras and train in DL4J 32/33

Slide 33

Slide 33 text

Thank you! Questions? Checkout ! http://deeplearningbox.com/ My book " https://leanpub.com/big-geodata-analysis-with- apache-spark 33/33