Before we begin: The mathematical building blocks of neural networks

%JCRVGT $GHQTGYGDGIKP 6JGOCVJGOCVKECNDWKNFKPI $NQEMUQHPGWTCNPGVYQTMU Tomoki Tanimura, Keio University

6JKU%JCRVGTEQXGTU • A first example of a neural network •
Tensor and tensor operations • How neural networks learn via backpropagation and gradient descent 2 Our goal in this chapter will be to build your intuition about these notions without getting overly technical.

/0+56&CVCUGV • MNIST (Mixed National Institute of Standards and Technology)
• Classify grayscale images of handwritten digitis (28 x 28 pixels) into 10 categories ( 0 through 9 ) • 60000 training images + 10000 test images 4 0 2 4 3 28 28 data label

.QCF/0+56&CVCUGV 5 0 : : : : 2 : :
: : <= Import keras (python library) • The model trained using training dataset and then evaluated by test dataset

• Training set • Test set 6JG&GVCKNQH&CVCUGV 6 … 60000
28 28 0 2 3 …

/QTG&GVCKNQH+OCIG 7

%CNEWNCVGCPKOCIG 8

6JG(NQYQH/CEJKPG.GCTPKPI • Training • Test Model Label = 5 1.
Input 2. Calcurate 3. Output 4. Feedback & Revision 6 … Model Label = 5 Input Calcurate Output 5 Correct! …

'ZCORNGQH/QFGN • Define the Neural Network Model • Model is
consisted of many layers which calculate the data • layer extract the representations 10 ← Import models and layers from keras (python library) ← Define like a model ↑ add the calculation methods (

%QPHKIWTCVKQPHQTVTCKPKPIVJGOQFGN • A loss function • An index of how
much the model can't predict correctl • The model trained for minimizing the loss function • Optimizer • An algorithm that optimize the model • How to feedback to the model • Metrics • Accuracy 11

6JGVTCKPKPIHNQY Model Label = 8 1. Input 2. Calcurate 3.
Output 4. Feedback & Revision 6

6JGVTCKPKPIHNQY Label = 8 Loss function 6 0.01 0.02 0.01
0.03 0.02 0.03 0.5 0.02 0.3 0.08 Feedback by Optimizer 0.01*3+2 0.01 0.2 0.2*0.01+0.2 0.01

-GTCUVTCKPKPIUGVWR • Network Compile • Data Preprocessing 14

6TCKPKPICPF6GUV • Training • Test • Overfitting • Train Accuracy
>> Test Accuracy 15

&CVCTGRTGUGPVCVKQPUHQT00 • All current machine-learning systems use tensors as their
basic data structure • Tensor is a container of numbers • Tensor is a generalization of matrices to an arbitrary numbers of dimensions. • In tensor, dimension is often called axis 17 0D tensors 1D tensors 2D tensors 3D tensors 4D tensors …

.GVņUVT[PWOR[FCVCQRGTCVKQP • Go to https://paiza.io/en • How to use python
• Python is the numerical calculation library • Numpy is the tensor operation library of Python • print( … ) : write … in the standard output • How to use Numpy • First, “import numpy as np” • x = np.array([…]) : create the x which is the … array 18

5ECNCT & CPF8GEVQT & • Scalar: 0D tensor • Vector:
1D tensor 19

.GVņUVT[PWOR[FCVCQRGTCVKQP • Matrix: 2D tensor • High dimensional Tensor: 3D
tensor 20

-G[#VVTKDWVGU 21 Number of Axes (Rank) >>> x.shape (3, 5)
>>> x.dtype int64 Shape dtype

.QQMDCEMCV/0+56&CVCUGV 22 uint8 is the number between 0 and 255
60000 28 28

• Go to https://paiza.io/en • Create a temporary train_images of
MNIST 6GPUQT1RGTCVKQPWUKPI0WOR[ 23

5NKEKPI • “:” is the slice for extracting the specified
data 24 60000 28 28 10~100

5NKEKPI 25 60000 28 28 14~28 14~28 60000 28 28
7~14 7~14

$CVEJ5CORNKPI • The set of the data inputted to the
model and calculated together 26 Model Labels = [8, 3, 6, 4] 1. Input 2. Calcurate 3. Output 4. Feedback & Revision [6, 2, 3, 4] Batch size (4)

• Create the batch sample using the slicing operation •
All batch samples is the same size %TGCVGVJGDCVEJ 27

4GCNYQTNFGZCORNGUQHVGPUQTFCVC • Vector data • 2D tensor of shape (sample,
features) • Timeseries data or Sequence data • 3D tensor of shape (samples, timesteps, features) • Images • 4D tensor of shape (samples, height, width, channels) • Videos • 5D tensor of shape (samples, frames, h, w, channels) 28

8GEVQT&CVC • Table data • Samples x Features • Ex:
titanic dataset 29

6KOGUGTKGUFCVCQT5GSWGPEGFCVC • The dataset of stock prices 30 0:00 0:01
0:02 0:03 0:04 0:05 0:06 … 23:59 Max 2 9 7 56 8 6 8 … 4 Min 0 1 5 98 6 4 3 … 9 Now 6 7 9 6 4 67 98 … 1 0:00 0:01 0:02 0:03 0:04 0:05 0:06 … 23:59 Max 2 9 7 56 8 6 8 … 4 Min 0 1 5 98 6 4 3 … 9 Now 6 7 9 6 4 67 98 … 1 0:00 0:01 0:02 0:03 0:04 0:05 0:06 … 23:59 Max 2 9 7 56 8 6 8 … 4 Min 0 1 5 98 6 4 3 … 9 Now 6 7 9 6 4 67 98 … 1 0:00 0:01 0:02 0:03 0:04 0:05 0:06 … 23:59 Max 2 9 7 56 8 6 8 … 4 Min 0 1 5 98 6 4 3 … 9 Now 6 7 9 6 4 67 98 … 1 0:00 0:01 0:02 0:03 0:04 0:05 0:06 … 23:59 Max 2 9 7 56 8 6 8 … 4 Min 0 1 5 98 6 4 3 … 9 Now 6 7 9 6 4 67 98 … 1 … Time Date Features

+OCIGU • Samples x Height x Width x Channles 31

8KFGQU • Samples x Frames x Height x Width x
Channles 32 Frames Frames … Samples

Tensor and Tensor operations • How neural networks learn via backpropagation and gradient descent 33 Our goal in this chapter will be to build your intuition about these notions without getting overly technical.

6GPUQT1RGTCVKQPU Label = 8 Loss function 6 0.01 0.02 0.01

NC[GTCPFVGPUQTQRGTCVKQP • Layer specify the calculation method 35

6JGFGVCKNQH&GPUGNC[GT • Dense layer is defined the following calculation 36
… Output is the larger of 0 and (input) * W + b 0.01 0.01*3+2 0.01 0.2 0.2*0.01+0.2 0.01 …

'NGOGPVYKUGQRGTCVKQPU • The ReLU operation and addition is the element-wise
operation • Implementation based on the native Python • Implementation based on the Numpy • These operations is implemented Fortran or C by BLAS in Numpy, which is much faster than the native python 37

$TQCFECUV1RGTCVKQP • Make it possible to add the two tensors
whose shapes differ • The smaller tensor is repeated alongside these new axes to match the full shape of the larger tensor 38

&QV1RGTCVKQP VGPUQTRTQFWEV • Most common and useful tensor operation 39
Vector dot product Matrix dot product Numpy Operation Mathematical Operation

/CVTKZFQVRTQFWEVDQZFKCITCO 40

6GPUQTTGUJCRKPI • Convert the shape of the tensor • Often
use reshaping for “transposition” 41

6TCPURQUG • Exchange the two axes 42

.GVņUTGUJCRGCPFVTCPURQUG • Go to paiza! 43 Reshape Transpose

)GQOGVTKE+PVGTRTGVCVKQPQH6GPUQT1RGTCVKQP 44

)GQOGVTKE+PVGTRTGVCVKQPQH&GGR.GCTPKPI • Neural Network is the chains of simple tensor
operations which are just geometric transformation of input data • What a neural network (or any other machine-learning model) is meant to do is figure out a transformation of the paper ball that would uncrumple it, so as to make the two classes cleanly separable again. • Uncrumpling paper balls is what machine learning is about: finding neat representations for complex, highly folded data manifolds. 45

Tensor and Tensor operations • How neural networks learn via backpropagation and gradient descent 46 Our goal in this chapter will be to build your intuition about these notions without getting overly technical.

1RVKOK\CVKQP Label = 8 Loss function 6 0.01 0.02 0.01

7RFCVGVJGYGKIJVU • Weights matrices are filled with small random values
• There’s no reason to expect that relu(dot(W, input) + b) when w and b are random, will yield any useful representations • Weights are adjusted gradually to predict accurately from random value 48 weights (trainable parameters) This process is called “training” or “training loop”

6JGHNQYQHVTCKPKPI • Draw a batch of training samples x and
corresponding targets y • Run the network on x to obtain predictions y_pred • Compute the loss of the network on the batch, a measure of the mismatch between y_pred and y • Update all weights of the network in a way that slightly reduces the loss on this batch 49

1XGTXKGYQHVJGQRVKOK\CVKQP Label = 8 Loss function 6 0.01 0.02 0.01

1XGTXKGYQHVJGQRVKOK\CVKQP Label = 8 6 Loss function: |6 - 8|
= 2 Measure of mismatch W: 1 => 2 b: 2 => 3 1. Predict (Calculation) 2. Calculate the loss 3. Update the weights

• Update the weights using the “gradient” of the loss
with regard to the network’s coefficients *QYWRFCVGVJGYGKIJVU 52 Loss function (differentiable)

/CVJGOCVKECN'ZRNCPCVKQP 53

/CVJGOCVKECN'ZRNCPCVKQP 54

%QORNGZKV[.QUUHWPEVKQP 55

.GCTPKPITCVG • How much the weight should be updated 56

1RVKOK\GT • These methods or the plans to update the
weights • SGD, AdaGrad, RMSProp, … 57

)NQDCNCPF.QECN/KPKOWO • The purpose of the training is “Loss =
Global minimum” • But, in the case of using Simple optimizer, the model’s weights is converged at “Local minimum” 58

/QOGPVWO • Invention of optimization to avoid converging the “Local
minimum” • When updating the weights, the optimizer with momentum takes “the update log” into account 59

.QQMDCEMCVQWTHKTUVGZCORNG • Data 60 … 60000 28 0 2 3
…

.QQMDCEMCVQWTHKTUVGZCORNG • Model (Network) 61 0.01 0.01*3+2 0.01 0.2 0.2*0.01+0.2
0.01

.QQMDCEMCVQWTHKTUVGZCORNG • Compile (How to optimize the model) 62 Loss
function (differentiable)

.QQMDCEMCVQWTHKTUVGZCORNG • Training loop (the configurations for optimizing the model)
• fit • the method of starting to iterate on the training data in mini-batches of 128 samples, 5 times over • epoch • Each iteration over all training data • After these 5pochs, • The network will have performed 2,345 gradient updates (469 per epoch) • The loss of the network will be sufficiently low that the network will be capable of classifying handwritten digits with high accuracy 63

Before we begin: The mathematical building bloc...

Before we begin: The mathematical building blocks of neural networks

More Decks by tanimutomo

Featured

Transcript