Slide 1

Slide 1 text

Introduction to Deep Learning with KotlinDL Alexey Zinovyev, ML Engineer, Apache Ignite PMC JetBrains

Slide 2

Slide 2 text

Bio 1. Java & Kotlin developer 2. Distributed ML enthusiast 3. Apache Ignite PMC 4. TensorFlow Contributor 5. ML engineer at JetBrains 6. Happy father and husband 7. https://github.com/zaleslaw

Slide 3

Slide 3 text

Motivation 1. Kotlin took a course to become a convenient language for data science 2. No modern data science without Neural Networks 3. All deep learning frameworks are good enough at image recognition 4. Convolutional neural networks (CNNs) are the gold standard for image recognition 5. Training, Transfer Learning, and Inference are now available for different CNN architectures on Kotlin with KotlinDL library

Slide 4

Slide 4 text

Agenda 1. Neural Network Intro 2. Deep Learning 3. Required and optional math knowledge 4. Primitives or building blocks a. Activation Functions b. Loss Functions c. Initializers d. Optimizers e. Layers 5. 5 Major Scientific breakthroughs in DL 6. Kotlin DL Demo

Slide 5

Slide 5 text

Some basic terms 1. Model

Slide 6

Slide 6 text

Some basic terms 1. Model 2. Inference

Slide 7

Slide 7 text

Some basic terms 1. Model 2. Inference 3. Training

Slide 8

Slide 8 text

Some basic terms 1. Model 2. Inference 3. Training 4. Transfer Learning

Slide 9

Slide 9 text

Some basic terms 1. Model 2. Inference 3. Training 4. Transfer Learning 5. Evaluation

Slide 10

Slide 10 text

Some basic terms 1. Model 2. Inference 3. Training 4. Transfer Learning 5. Evaluation 6. Train/validation/test datasets

Slide 11

Slide 11 text

Neural Network Intro

Slide 12

Slide 12 text

The life of one neuron

Slide 13

Slide 13 text

The place of one neuron in his family

Slide 14

Slide 14 text

Forward propagation

Slide 15

Slide 15 text

Cat/Dog neural network architecture in Kotlin

Slide 16

Slide 16 text

MNIST example

Slide 17

Slide 17 text

MNIST Subset

Slide 18

Slide 18 text

Backward propagation

Slide 19

Slide 19 text

Full training with some maths

Slide 20

Slide 20 text

Deep Learning is just...

Slide 21

Slide 21 text

Way to approximate unknown function

Slide 22

Slide 22 text

math

Slide 23

Slide 23 text

Need to keep in mind ● Some trigonometry, exponentials and logarithms; ● Linear Algebra: vectors, vector space; ● Linear Algebra: inverse and transpose matrices, matrix decomposition, eigenvectors, Kronecker-Capelli’s theorem; ● Mathematical Analysis: continuous, monotonous, differentiable functions; ● Mathematical Analysis: derivative, partial derivative, Jacobian; ● Methods of one-dimensional and multidimensional optimization; ● Gradient Descent and all its variations; ● Optimization methods and convex analysis will not be superfluous in your luggage

Slide 24

Slide 24 text

N-dimensional space in theory

Slide 25

Slide 25 text

Matrix multiplication (friendly reminder)

Slide 26

Slide 26 text

Loss Functions

Slide 27

Slide 27 text

Loss Functions ● Each loss function could be reused as metric ● Should be differentiable ● Not every metric could be a loss function ( metrics could have not the derivative ) ● Loss function could be very complex ● Are different for regression and classification tasks

Slide 28

Slide 28 text

An optimization problem [Loss Optimization Problem]

Slide 29

Slide 29 text

Most widely used:

Slide 30

Slide 30 text

Gradients

Slide 31

Slide 31 text

Gradient Descent

Slide 32

Slide 32 text

Optimizers

Slide 33

Slide 33 text

Optimizers: SGD with memory

Slide 34

Slide 34 text

Optimizers ● SGD ● SGD with Momentum ● Adam ● RMSProp ● AdaDelta ...

Slide 35

Slide 35 text

It’s faster as a result

Slide 36

Slide 36 text

Wire it together with KotlinDL

Slide 37

Slide 37 text

Activation Functions

Slide 38

Slide 38 text

Activation functions ● Activation functions change the outputs coming out of each layer of a neural network. ● Required to add non-linearity ● Should have a derivative (to be used in backward propagation)

Slide 39

Slide 39 text

Linear

Slide 40

Slide 40 text

Sigmoid

Slide 41

Slide 41 text

Tanh

Slide 42

Slide 42 text

ReLU

Slide 43

Slide 43 text

Hm..

Slide 44

Slide 44 text

Initializers

Slide 45

Slide 45 text

Vanishing gradient problem

Slide 46

Slide 46 text

Exploding gradient problem

Slide 47

Slide 47 text

Initializers ● Zeros ● Ones ● Random [Uniform or Normal] ● Xavier / Glorot ● He ...

Slide 48

Slide 48 text

Layers

Slide 49

Slide 49 text

Dense

Slide 50

Slide 50 text

Cat/Dog neural network architecture in Kotlin

Slide 51

Slide 51 text

Conv2d: filters

Slide 52

Slide 52 text

Output with filters

Slide 53

Slide 53 text

Pooling (subsampling layer)

Slide 54

Slide 54 text

Dropout

Slide 55

Slide 55 text

Everything now available in Kotlin

Slide 56

Slide 56 text

How it works?

Slide 57

Slide 57 text

How it works?

Slide 58

Slide 58 text

How it works?

Slide 59

Slide 59 text

How it works?

Slide 60

Slide 60 text

KotlinDL Demo

Slide 61

Slide 61 text

KotlinDL Limitations 1. Now useful for Image Recognition task and Regression ML 2. Limited number of layers is supported 3. Tiny number of preprocessing methods 4. Only VGG-like architectures are supported 5. No Android support

Slide 62

Slide 62 text

KotlinDL Roadmap 1. New models: Inception, ResNet, DenseNet 2. Rich Dataset API 3. GPU settings 4. Maven Central Availability 5. Functional API 6. New layers: BatchNorm, Add, Concatenate, DepthwiseConv2d 7. Regularization for layers will be added 8. New metrics framework 9. Conversion to TFLite (for mobile devices) 10. ONNX support 11. ML algorithms

Slide 63

Slide 63 text

Useful links 1. https://github.com/JetBrains/KotlinDL 2. https://kotlinlang.org/docs/data-science-overview.html#kotlin-libraries 3. #deeplearning channel on Kotlin slack (join it, if you are not yet) 4. Feel free to join discussions on Github 5. Follow @zaleslaw and @KotlinForData on Twitter

Slide 64

Slide 64 text

The End

Slide 65

Slide 65 text

LSTM consist of LSTM “neurons”

Slide 66

Slide 66 text

5 Major Scientific breakthroughs

Slide 67

Slide 67 text

5 Major Scientific breakthroughs 1. Non-linearity in early 90’s (could solve new problems) 2. ReLU and simplest and cheap non-linearity (could converge on new tasks) 3. Batch Normalization (could help convergence and give more acceleration) 4. Xe Initialization (solves vanishing/exploding gradient problem) 5. Adam optimizer (give more performance)