Deep Learning with PyTorch

Slide 1

Slide 1 text

Deep Learning and Artificial Intelligence Valerio Maggio, PhD with Pytorch @leriomaggio [email protected]

Slide 2

Slide 2 text

Me Pun Who? Background in CS PhD in Machine Learning for Software Engineering Research: ML/DL for BioMedicine wearing face masks since…

Slide 3

Slide 3 text

The AI Revolution

Slide 4

Slide 4 text

Artiﬁcial Intelligence according to Google

Slide 5

Slide 5 text

Machine Learning Artificial Intelligence Machine Learning Data Science Deep Pattern Matching ⛑

Slide 6

Slide 6 text

ML and Data Science Data Science Venn diagram By Drew Conway Data Loading Preprocessing Model learning API Interface (Model Adapted from “What about tests in Machine Learning projects?” Sarah Diot-Girard - EuroSciPy 2019 (toy) Data Science Pipeline

Slide 7

Slide 7 text

What is Machine Learning

Slide 8

Slide 8 text

Machine Learning “Machine learning is the science (and art) of programming computers so they can learn from data” Aurélien Géron, Hands-on Machine Learning with Scikit-Learn and TensorFlow Source: bit.ly/ml-simple-definition “(ml) focuses on teaching computers how to learn without the need to be programmed for specific tasks” S. Pal & A. Gulli, Deep Learning with Keras

Slide 9

Slide 9 text

Machine Learning Machine learning teaches machines how to carry out tasks by themselves. It is that simple. The complexity comes with the details Louis Pedro Coelho, Building Machine Learning Systems with Python (and that’s probably one of the reason why you’re here :)

Slide 10

Slide 10 text

The (Machine) Learning is about DATA Data are one of the most important part of a ml solution Importance: Data >> Model ? Learning by examples Data Preparation is crucial! data algorithms

Slide 11

Slide 11 text

BioMedicine: another data case ? Contemporary Life Science is about data recent advances in sequencing techs and instruments (e.g. “bio-images”) huge datasets generated at incredible pace from human observation to data analysis cheminformatics (drug discovery) Research Impact —> Social and Human Impact

Slide 12

Slide 12 text

Why Deep Learning, btw? Subset of ML w/ very specific model: (Deep?) Neural Networks State of the art Theory ’50 / ’80 hw acceleration to train (~new) learning structure + composability (2018/19)

Slide 13

Slide 13 text

What about Deep Learning

Slide 14

Slide 14 text

Deep Learning A multi-layer feed-forward neural network that starts w/ an input layer fully connected, which is followed by multiple hidden layer of non-linear transformation

Slide 15

Slide 15 text

More details…

Slide 16

Slide 16 text

More details… ReLu | sigmoid | tanh

Slide 17

Slide 17 text

More details… Repeat for each layer…

Slide 18

Slide 18 text

More details… Image Classification Task

Slide 19

Slide 19 text

More details… Summary: A Neural Network is: • Built from layers; each of which is: • a matrix multiplication, • then add bias • then apply non-linearity Learn values for parameters; W and b (for each layer using Back-Propagation)

Slide 20

Slide 20 text

Machine Learning for dummies (a.k.a. ML explained to computer scientists) Note: I *am* a computer scientist + Matrix Multiplication Random Number Generation Machine Learning = ( ) t≅2k Deep

Slide 21

Slide 21 text

Ml / Dl basics in a NutShell

Slide 22

Slide 22 text

features labels (raw) data ML/DL Model Training Trained Model (unseen) data Test Predictions Supervised learning supervision

Slide 23

Slide 23 text

features labels (raw) data ML/DL Model Training Trained Model (unseen) data Test Similarities/likelihood UnSupervised learning

Slide 24

Slide 24 text

labels (raw) data DL Model Training Trained Model (unseen) data Test Predictions supervision features Deep Supervised learning

Slide 25

Slide 25 text

Supervised Training Loop labels (raw) data Model Parameters Loss loss predictions

Slide 26

Slide 26 text

Supervised Training Loop breakdown.. (raw) Data - a.k.a. Observations / Input Items about which we want to predict something. We usually will denote observation with x. Labels - a.k.a. Targets (i.e. Ground Truth) Labels corresponding to observations. These are usually the things being predicted. Following standard notations in ML/DL, we will use y to refer to these. Model f(x) = ˆy A mathematica expression or a function that takes an observation x and predicts the value of its target label. Predictions - a.k.a. Estimates: Values of the Targets generated by the model - usually referred to as ˆy Parameters - a.k.a. Weights (in DL terminology) Parameters of the Model. We will refer to them using the w. Loss Function L(y, ˆy): Function that compares how far off a prediction is from its target for observations in the training data. The loss function assigns a scalar real value called the loss. The lower the value of the loss, the better the model is predicting. The Loss is usually referred to as L Source:D. Rao et al. - Natural Language Processing with PyTorch, O’Reilly 2019

Slide 27

Slide 27 text

(DL) Terms: everyone on the same page? also ref: bit.ly/nvidia-dl-glossary Epochs Batches and mini-batch learning Parameters vs HyperParameters (e.g. weights vs layers) Loss & Optimiser (e.g. Cross Entropy & SGD) Transfer learning Gradient & Backward Propagation Tensor

Slide 28

Slide 28 text

Python has its say Machine Learning Deep Learning “There should be one, and preferably one, way to do it” The Zen of Python

Slide 29

Slide 29 text

Multiple Frameworks?

Slide 30

Slide 30 text

If someone tells you …

Slide 31

Slide 31 text

Deep Learning Frameworks Static Graph Dynamic Graph X b W * + σ xTW + b (xTW + b) σ Computational Graph Models Linear (or Dense) + + y L y’ fc1 fc2 fc3 fc4 fc5 + + y1 L y’ fc2 fc3 fc4 fc5 fc1 X1 epoch 1, batch 1 + + L y’ fc2 fc3 fc4 fc5 fc1

Slide 32

Slide 32 text

Deep Learning Frameworks Static Graph Dynamic Graph X b W * + σ xTW + b (xTW + b) σ Computational Graph Models Linear (or Dense) + + y L y’ fc1 fc2 fc3 fc4 fc5 + + L y’ fc2 fc3 fc4 fc5 fc1 + + y2 L y’ fc2 fc3 fc4 fc5 fc1 X2 epoch 1, batch 2

Slide 33

Slide 33 text

Deep Learning Frameworks Static Graph Dynamic Graph X b W * + σ xTW + b (xTW + b) σ Backwards and Gradients Calculation Linear (or Dense) + + y L y’ fc1 fc2 fc3 fc4 fc5 + + y L y’ fc2 fc3 fc1 X fc5 fc4 Backprop Autograd Record

Slide 34

Slide 34 text

Slide 35

Slide 35 text

rundown review of basic PyTorch features we will see soon Spoiler Alert

Slide 36

Slide 36 text

Tensors, NumPy, Devices Numpy-like API tensor -> ndarray tensor <- ndarray CUDA support

Slide 37

Slide 37 text

Neural Module subclassing Definition of layers (i.e. tensors) Definition of graph (i.e. network)

Slide 38

Slide 38 text

Loss and Gradients optimiser criterion & loss backprop & update

Slide 39

Slide 39 text

Dataset and DataLoader transformers Dataset DataLoader

Slide 40

Slide 40 text

So….

Slide 41

Slide 41 text

Deep learning Repository: http://github.com/leriomaggio/deep-learning-pytorch mybinder Colaboratory (more on this in few minutes) with PyTorch

Slide 42

Slide 42 text

Deep Learning Course Approach: Data Scientist Always prefer dev/practical aspects (tools & sw) Work on full pipeline (e.g. data preparation) Emphasis on the implementation Perspective: Researcher No off-the-shelf (so no “black-box”) solutions” References and Further Readings to know more features

Slide 43

Slide 43 text

Let’s roll up our sleeves & get on with the hands-on