MLCon 2021 - Speaker Deck

Slide 1

Slide 1 text

17.11.2020 by Łukasz Gebel & Piotr Czajka

Slide 2

Slide 2 text

achine Learning: The Bare Math Behind Libraries 23.06.2021 by Łukasz Gebel & Piotr Czajka

Slide 3

Slide 3 text

Super Heroes & Super Powers

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

Agenda ● What is Machine Learning? ● Supervised learning ● Unsupervised learning ● Q&A

Slide 6

Slide 6 text

Machine Learning „Field of study that gives computers the ability to learn without being explicitly programmed.” Arthur Samuel

Slide 7

Slide 7 text

Machine Learning „I consider every method that needs training as intelligent or machine learning method.” Our Lecturer

Slide 8

Slide 8 text

Supervised learning ● It is similar to be tought by a teacher.

Slide 9

Slide 9 text

Supervised learning ● Build your model that performs particular task: – Prepare data set consisting of examples & expected outputs

Slide 10

Slide 10 text

Supervised learning ● Build your model that performs particular task: – Prepare data set consisting of examples & expected outputs – Present examples to your model

Slide 11

Slide 11 text

Supervised learning ● Build your model that performs particular task: – Prepare data set consisting of examples & expected outputs – Present examples to your model – Check how it responds (model’s output values)

Slide 12

Slide 12 text

Slide 13

Slide 13 text

Neural Networks ● Inspired by biological brain mechanisms ● Many applications: – Computer vision – Speech recognition – Compression

Slide 14

Slide 14 text

Biological Neuron http://www.marekrei.com/blog/wp-content/uploads/2014/01/neuron.png

Slide 15

Slide 15 text

Artificial Neuron ● Inputs (x 1, … , x n ) are features of single example ● Multiply each input by weight, sum it and put the sum as an argument of activation function w 1 w 2 w n w 0 Σ x 1 x 2 x n ... s y

Slide 16

Slide 16 text

Activation function ● Sigmoid – Maps sum of neurons signals to value from 0 to 1 – Continous, nonlinear – If input is positive it gives values > 0.5 f (x)= 1 1+e(−βx)

Slide 17

Slide 17 text

Linear Regression ● Method for modelling relationship between variables ● Simplest form: how x relates to y ● Examples: – House size vs house price – Voltage vs electric current

Slide 18

Slide 18 text

Real life problem

Slide 19

Slide 19 text

What defines a superhero?

Slide 20

Slide 20 text

Costume

Slide 21

Slide 21 text

Costume

Slide 22

Slide 22 text

Costume price vs number of issues ● For given amount of money predict in how many comic book issues you’ll appear. Costume price(x) Number of issues (y) 240 6370 480 8697 ... ... 26 2200

Slide 23

Slide 23 text

Interesting fact ● Invisible woman costume costs $120 Invisible woman

Slide 24

Slide 24 text

Linear regression ● Let's have a function: f (x ,Θ)=Θ1 x+Θ0 f (x,Θ)−number of comicbookissues x−costume price Θ−parameters

Slide 25

Slide 25 text

Let's plot

Slide 26

Slide 26 text

Let's plot

Slide 27

Slide 27 text

Let's plot

Slide 28

Slide 28 text

Warning!

Slide 29

Slide 29 text

Objective function Q(Θ)= 1 2 N ∑ j=0 N (f ( xj ,Θ)−y j )2 Q(Θ)−objectivefunction N−numberof datasamples j−indexof particulardatasample

Slide 30

Slide 30 text

Objective function Q(Θ)= 1 2 N ∑ j=0 N (f ( xj ,Θ)−y j )2 Q(Θ)−objectivefunction N−numberof datasamples j−indexof particulardatasample

Slide 31

Slide 31 text

Objective function Q(Θ)= 1 2 N ∑ j=0 N (f ( xj ,Θ)−y j )2 Q(Θ)−objectivefunction N−numberof datasamples j−indexof particulardatasample

Slide 32

Slide 32 text

Objective function Q(Θ)= 1 2 N ∑ j=0 N (f ( xj ,Θ)−y j )2 Q(Θ)−objectivefunction N−numberof datasamples j−indexof particulardatasample

Slide 33

Slide 33 text

Objective function - intuition f (xj ,Θ)=4 y=2 (4−2)2 =22 =4 sum+=4

Slide 34

Slide 34 text

Objective function - intuition f (xj ,Θ)=1 y=1 (1−1)2 =02 =0 sum+=0

Slide 35

Slide 35 text

Gradient descent ● Find the minimum of the objective function ● Iteratively update function parameters: Θ0 (t +1)=Θ0 (t)−α 1 N ∑ j=0 N (f ( xj ,Θ)− y j ) Θ1 (t +1)=Θ1 (t )−α 1 N ∑ j=0 N (f (xj ,Θ)− yj ) x t−number of iteration α−learning step

Slide 36

Slide 36 text

Slide 37

Slide 37 text

Slide 38

Slide 38 text

Gradient descent

Slide 39

Slide 39 text

Gradient descent −α ∗ +derivative < 0

Slide 40

Slide 40 text

Gradient descent −α ∗ +derivative < 0

Slide 41

Slide 41 text

Gradient descent −α ∗ +derivative < 0

Slide 42

Slide 42 text

Gradient descent −α ∗ +derivative < 0

Slide 43

Slide 43 text

Gradient descent −α ∗ −derivative > 0

Slide 44

Slide 44 text

Gradient descent −α ∗ −derivative > 0

Slide 45

Slide 45 text

Gradient descent −α ∗ −derivative > 0

Slide 46

Slide 46 text

Gradient descent −α ∗ −derivative > 0

Slide 47

Slide 47 text

Demo

Slide 48

Slide 48 text

Linearly separable problem

Slide 49

Slide 49 text

Not linearly separable problem

Slide 50

Slide 50 text

NN – random weights init +1 +1 x 1 x 2 y

Slide 51

Slide 51 text

NN – feed forward +1 +1 x 1 x 2 y

Slide 52

Slide 52 text

NN – feed forward +1 +1 x 1 x 2 y

Slide 53

Slide 53 text

NN – feed forward +1 +1 x 1 x 2 y

Slide 54

Slide 54 text

NN – feed forward +1 +1 x 1 x 2 y

Slide 55

Slide 55 text

NN – compute error +1 +1 x 1 x 2 y ( y−expected output)2

Slide 56

Slide 56 text

NN – backpropagation step ● Use gradient descent and computed error ● Update every weight of every neuron from hidden and output layer

Slide 57

Slide 57 text

NN – backpropagation step +1 +1 x 1 x 2 y ( y−expected output)2

Slide 58

Slide 58 text

NN – backpropagation step +1 +1 x 1 x 2 y

Slide 59

Slide 59 text

Neural Network „Neural networks are nothing more than randomized optimization.” Another Lecturer

Slide 60

Slide 60 text

Real life problem ● You said it can solve non linear problems, let's generate superhero logo using it.

Slide 61

Slide 61 text

Unsupervised learning

Slide 62

Slide 62 text

What? They can learn by themselves?

Slide 63

Slide 63 text

What? They can learn by themselves?

Slide 64

Slide 64 text

Why would we let them? ● Less complex mathematical apparatus than in supervised learning. ● It is similar to discovering world on your own.

Slide 65

Slide 65 text

Why would we let them? Used mostly for sorting and grouping when: ● Sorting key can’t be easily figured out. ● Data is very complex and finding the key is not trivial.

Slide 66

Slide 66 text

Idea behind – groups

Slide 67

Slide 67 text

Idea behind – groups

Slide 68

Slide 68 text

Idea behind – groups

Slide 69

Slide 69 text

Hebbian learning

Slide 70

Slide 70 text

Hebbian learning ● Works similar to the nature ● Great for beginners and biological simulations :) ● Simple Hebbian learning algorithm Δ w ij =η⋅x ij ⋅y i Δ w ij −change of j weight of ineuron η−learningcoefficient x ij − jinput of ineuron y i −output of i neuron

Slide 71

Slide 71 text

Hebbian learning ● Works similar to the nature ● Great for beginners and biological simulations :) ● Generalised Hebbian learning algorithm Δ w ij =F(x ij , y j ) Δ w ij −change of j weight of ineuron η−learningcoefficient x ij − jinput of ineuron y i −output of i neuron

Slide 72

Slide 72 text

Hebb’s neuron model w 1 w 2 w n w 0 Δ w ij =F(x ij , y j ) Σ x 1 x 2 x n ... 1 s y

Slide 73

Slide 73 text

Hebb’s neuron model 0.230 0.010 0.900 0.110 Δ w ij =F(x ij , y j ) Σ 0.200 0.300 0.100 1

Slide 74

Slide 74 text

Hebb’s neuron model 0.230 0.010 0.900 0.110 Δ w ij =F(x ij , y j ) Σ 1 0.046 0.003 0.090 0.110 0.200 0.300 0.100

Slide 75

Slide 75 text

Hebb’s neuron model 0.230 0.010 0.900 0.110 Δ w ij =F(x ij , y j ) Σ 1 0.046 0.003 0.090 0.110 0.249 0.200 0.300 0.100

Slide 76

Slide 76 text

Hebb’s neuron model 0.230 0.010 0.900 0.110 Δ w ij =F(x ij , y j ) Σ 1 0.046 0.003 0.090 0.110 0.249 0.562 0.562 0.200 0.300 0.100

Slide 77

Slide 77 text

Hebb’s neuron model 0.230 0.010 0.900 0.110 Δ w ij =F(x ij , y j ) Σ 1 0.562 0.200 0.300 0.100

Slide 78

Slide 78 text

Hebb’s neuron model 0.230 0.010 0.900 0.110 Δ w ij =F(x ij , y j ) Σ 1 0.562 +0.011 +0.016 +0.005 +0.056 0.300 0.100 0.200

Slide 79

Slide 79 text

Hebb’s neuron model 0.241 0.026 0.905 0.166 Δ w ij =F(x ij , y j ) Σ x 1 x 2 x 3 1

Slide 80

Slide 80 text

Demo

Slide 81

Slide 81 text

Imagine superhero teams

Slide 82

Slide 82 text

Marvel database to the rescue Intelligence Strength Speed Durability Energy projection Fighting skills Iron Man 6 6 5 6 6 4 Spiderman 4 4 3 3 1 4 Black Panter 5 3 2 3 3 5 Wolverine 2 4 2 4 1 7 Thor 2 7 7 6 6 4 Dr Strange 4 2 7 2 6 6 Hulk 2 7 3 7 5 4 Cpt. America 3 3 2 3 1 6 Mr Fantastic 6 2 2 5 1 3 Human Torch 2 2 5 2 5 3 Invisible Woman 3 2 3 6 5 3 The Thing 3 6 2 6 1 5 Luke Cage 3 4 2 5 1 4 She Hulk 3 7 3 6 1 4 Ms Marvel 2 6 2 6 1 4 Daredevil 3 3 2 2 4 5

Slide 83

Slide 83 text

Hebbian learning weaknesses ● Unstable. ● Prone to rise the weights ad infinitum. ● Some groups can trigger no response. ● Some groups may trigger response from too many neurons.

Slide 84

Slide 84 text

And now – self organising networks

Slide 85

Slide 85 text

Self organising?

Slide 86

Slide 86 text

Learning with concurrency ● You try to generalize input vector in weights vector. ● Instead of checking the reaction to input - you check distance between both vectors. ● Ideally – each neuron specializes in one class generalization. ● Two main strategies: – Winner Takes All (WTA) – Winner Takes Most (WTM)

Slide 87

Slide 87 text

Idea behind Neuron weights 3.0 2.0 2.0 Example 1.0 2.0 3.0

Slide 88

Slide 88 text

Example 1.0 2.0 3.0 Idea behind Distance 2.0 0.0 -1.0 Neuron weights 3.0 2.0 2.0 d i =w i −x i Euclidiandistance √∑ i=1 n d i 2 =√5

Slide 89

Slide 89 text

Distance 2.0 0.0 -1.0 Idea behind Neuron weights 3.0 2.0 2.0 Learning coefficient η=0.100 Learning Step 0.2 0.0 -0.1 Δ w i =η⋅d i Example 1.0 2.0 3.0 d i =w i −x i Learning coefficient η=0.100

Slide 90

Slide 90 text

Idea behind Distan ce 2.0 0.0 -1.0 Neuron weights 2.8 = 3.0 – 0.2 2.0 = 2.0 – 0.0 2.1 = 2.0 -(-0.1) Exam ple 1.0 2.0 3.0 d i =w i −x i Learning coefficient η=0.100 Learning Step 0.2 0.0 -0.1 w' i =w i −Δw i Δ w i =η⋅d i Example 1.0 2.0 3.0 Learning Step 0.2 0.0 -0.1 Δ w i =η⋅d i Distance 2.0 0.0 -1.0 d i =w i −x i

Slide 91

Slide 91 text

Idea behind Neuron weights 2.8 2.0 2.1

Slide 92

Slide 92 text

Idea behind – initial setup 1 2 3

Slide 93

Slide 93 text

Idea behind – mid learning 1 2 3

Slide 94

Slide 94 text

Idea behind – homeostasis 1 2 3

Slide 95

Slide 95 text

Demo time

Slide 96

Slide 96 text

Learning with concurrency ● Gives more diverse groups. ● Less prone to clustering (than Hebb’s). ● Searches wider spectrum of answers. ● First step to more complex networks.

Slide 97

Slide 97 text

Learning with concurency - weaknesses ● WTA – works best if teaching examples are evenly distributed in solution space. ● WTM – works best if weights’ vectors are evenly distributed in solution space. ● Still can stick to local optimum.

Slide 98

Slide 98 text

Teuvo Kohonen’s SOM Created by this nice guy here

Slide 99

Slide 99 text

Kohonen’s self-organizing map ● The most popular self-organizing network with concurrency algorithm. ● It teaches groups of neurons with WTM alghoritm ● Special features: – Neurons are organised in a grid – Nevertheless – they are treated as a single layer

Slide 100

Slide 100 text

Kohonen’s self-organizing map w ij (s+1)=w ij (s)+Θ(k best ,i , s)⋅η(s)⋅(I j (s)−w ij (s)) s−epochnumber k best −best neuron w ij (s)− j weight of ineuron Θ(k best ,i,s)−neighbourhood function η(s)−learning coefficient for sepoch I j (s)− jchunk of example for s epoch

Slide 101

Slide 101 text

SOM model By Mcld - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=10373592

Slide 102

Slide 102 text

Common weaknesses of artificial neuron systems ● We are still dependant on randomized weights. ● All algorithms can stick to local optimum.

Slide 103

Slide 103 text

Bibliography ● Presentation + code: https://bitbucket.org/medwith/public/downloads/mluvr-MLCon.zip ● https://www.coursera.org/learn/machine-learning ● https://www.coursera.org/specializations/deep-learning ● Math for Machine Learning - Amazon Training and Certification ● Linear and Logistic Regression - Amazon Training and Certification ● Grus J., Data Science from Scratch: First Principles with Python ● Patterson J., Gibson A., Deep Learning: A Practitioner's Approach ● Trask A., Grokking Deep Learning ● Stroud K. A., Booth D. J, Engineering Mathematics ● https://github.com/massie/octave-nn- neural network Octave implementation ● https://www.desmos.com/calculator/dnzfajfpym - Nanananana … Batman equation ;) ● https://xkcd.com/605/ - extrapolating ;) ● http://dilbert.com/strip/2013-02-02 - Dilbert & Machine Learning

Slide 104

Slide 104 text

Thank you