Slide 1

Slide 1 text

Neural Networks A brief introduction by a fan

Slide 2

Slide 2 text

If you want to know more neuralnetworksanddeeplearning.com 3blue1brown

Slide 3

Slide 3 text

Neural networks?

Slide 4

Slide 4 text

Recognising handwritten numbers

Slide 5

Slide 5 text

Recognising handwritten numbers

Slide 6

Slide 6 text

Neurons

Slide 7

Slide 7 text

Neurons

Slide 8

Slide 8 text

The Perceptron Activation function w1 w2 w3 (x1 * w1) + (x2 * w2) + (x3 * w3)

Slide 9

Slide 9 text

The Perceptron w1 w2 w3 (x1 * w1) + (x2 * w2) + (x3 * w3) >= threshold Move the threshold to the other sign of the sum Bias = threshold * -1 ((x1 * w1) + (x2 * w2) + (x3 * w3)) + bias >= 0

Slide 10

Slide 10 text

The Perceptron - do I go to the park? Is it raining? Is the temp above 15 C? Am I in lockdown due to COVID-19? 3 1 -20 Threshold = 3 Bias: -3 Activation = ((1 * 3) + (0 * 1) + (1 * -20)) - 3 > 0? = (-17) - 3 > 0 = -20 > 0 Output = 0

Slide 11

Slide 11 text

Activation function

Slide 12

Slide 12 text

Activation function - Sigmoid Sum of weights times inputs, plus bias

Slide 13

Slide 13 text

Neural networks - Structure

Slide 14

Slide 14 text

Handwritten numbers - Structure

Slide 15

Slide 15 text

Handwritten numbers - Structure

Slide 16

Slide 16 text

Handwritten numbers - Structure

Slide 17

Slide 17 text

Handwritten numbers - Structure

Slide 18

Slide 18 text

Handwritten numbers - Structure

Slide 19

Slide 19 text

Handwritten numbers - Structure

Slide 20

Slide 20 text

how does it work? ● Start with random numbers for all weights and biases. ● Train the network with training examples ● Assess how well it did by comparing actual output and desired using a cost function (or loss function) to compute the error. ● Try and reduce this error by tuning the weights and biases in the network

Slide 21

Slide 21 text

Cost function - outputs Training a network to recognise a 6 Desired 0 0 0 0 0 0 1 0 0 0 0

Slide 22

Slide 22 text

Cost function - outputs Training a network to recognise a 6 Desired Actual 0 0 0 0 0 0 1 0 0 0 0 0.3 0 0.5 0.2 0 0.1 0.3 0 0 0.9 0.8

Slide 23

Slide 23 text

Cost function - outputs Training a network to recognise a 6 Desired Actual |Difference| 0 0 0 0 0 0 1 0 0 0 0 0.3 0 0.5 0.2 0 0.1 0.3 0 0 0.9 0.8 0.3 0 0.5 0.2 0 0.1 0.7 0 0 0.9 0.8

Slide 24

Slide 24 text

Cost function Training a network Desired Actual |Difference| Number of training inputs weights biases

Slide 25

Slide 25 text

Cost function C(w,b)≈0 - Otherwise -

Slide 26

Slide 26 text

How to minimise a function? C(w) = w^2

Slide 27

Slide 27 text

C(w) = w ^ 2 dC/dw(w) = 2w 0 = 2w w = 0

Slide 28

Slide 28 text

How to minimise a function? C(w1,w2) w1 w2 ● Two variables = 3D graph ● 3+ variables = ??? puny human brain. But that’s fine, we can use the derivative. ● Use partial differentiation to understand derivative of a function with multiple inputs

Slide 29

Slide 29 text

No content

Slide 30

Slide 30 text

● Start at a random value for the input ● Work out the gradient at this point (the derivative) ● Determine how we should change the input to ‘descend’ down the slope depending on current gradient ● Repeat until you reach a local minimum

Slide 31

Slide 31 text

● Start at a random value for the input ● Work out the gradient at this point (the derivative) ● Determine how we should change the input to ‘descend’ down the slope depending on current gradient ● Repeat until you reach a local minimum

Slide 32

Slide 32 text

Gradient Descent + Tuning Inputs ● We have our cost function

Slide 33

Slide 33 text

Gradient Descent + Tuning Inputs ● We have our cost function ● We have the gradient of C for the current inputs - some shiny maths (vector of partial derivatives of C for each variable in the system).

Slide 34

Slide 34 text

Gradient Descent + Tuning Inputs ● We have our cost function ● We have the gradient of C for the current inputs - some shiny maths (vector of partial derivatives of C for each variable in the system). ● Use gradient descent to work out the change we want to make to each variable’s current value - the gradient times a variable called learning rate

Slide 35

Slide 35 text

Gradient Descent + Tuning Inputs ● We have our cost function ● We have the gradient of C for the current inputs - some shiny maths (vector of partial derivatives of C for each variable in the system). ● Use gradient descent to work out the change we want to make to each variable’s current value - the gradient times a variable called learning rate ● Produces a list of changes/nudges for every weight and bias in the system

Slide 36

Slide 36 text

No content

Slide 37

Slide 37 text

Summary: How it learns ● Start with random numbers for all weights and biases. ● Train the network with training examples ● Assess how well it did and recognising the numbers using a cost function. ● Minimise the cost function during gradient descent, create list of small nudges to the current values. ● Update all weights + bias for all neurons in one layer, then do the same process for every neuron in the previous layer (backpropagation) ● Iterate until we get a cost function output close to 0 and test accuracy!

Slide 38

Slide 38 text

Summary: How it learns

Slide 39

Slide 39 text

Summary* ● Artificial neurons made of inputs, weights, a bias, and an activation function. ● Emergent, automatically inferred decision making based on tying all these neurons together into layers ● Magical hidden layers of neural networks that infer decision using rules humans would not ● Improving performance by minimising cost function ● Using some hardcore maths to work out how adjust thousands of variables to improve an algorithm - making it learn from it mistakes! *AKA what I found cool about Neural Networks and hopefully you did too

Slide 40

Slide 40 text

Resources ● Neuralnetworksanddeeplearning.com (NN for any function, improving backprop) ● 3blue1brown (4 videos on NN, plenty of other great content) ● Gradient Descent + Backpropagation https://medium.com/datathings/neural-networks-and-backpropagation-explained-in-a-simple-way-f540a3611f5e ● Machine Learning failures - for art! https://www.thestrangeloop.com/2018/machine-learning-failures---for-art.html ● Look at other types of NN / Machine Learning (RNN, CNN, untrained models)