Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Neural Networks - a brief introduction by a fan

Neural Networks - a brief introduction by a fan

A brief introduction to the core concepts of how a simple NN is structured and learns

044bc809a03e9bc7548b180a30dd39a8?s=128

Luke Williams

March 31, 2020
Tweet

Transcript

  1. Neural Networks A brief introduction by a fan

  2. If you want to know more neuralnetworksanddeeplearning.com 3blue1brown

  3. Neural networks?

  4. Recognising handwritten numbers

  5. Recognising handwritten numbers

  6. Neurons

  7. Neurons

  8. The Perceptron Activation function w1 w2 w3 (x1 * w1)

    + (x2 * w2) + (x3 * w3)
  9. The Perceptron w1 w2 w3 (x1 * w1) + (x2

    * w2) + (x3 * w3) >= threshold Move the threshold to the other sign of the sum Bias = threshold * -1 ((x1 * w1) + (x2 * w2) + (x3 * w3)) + bias >= 0
  10. The Perceptron - do I go to the park? Is

    it raining? Is the temp above 15 C? Am I in lockdown due to COVID-19? 3 1 -20 Threshold = 3 Bias: -3 Activation = ((1 * 3) + (0 * 1) + (1 * -20)) - 3 > 0? = (-17) - 3 > 0 = -20 > 0 Output = 0
  11. Activation function

  12. Activation function - Sigmoid Sum of weights times inputs, plus

    bias
  13. Neural networks - Structure

  14. Handwritten numbers - Structure

  15. Handwritten numbers - Structure

  16. Handwritten numbers - Structure

  17. Handwritten numbers - Structure

  18. Handwritten numbers - Structure

  19. Handwritten numbers - Structure

  20. how does it work? • Start with random numbers for

    all weights and biases. • Train the network with training examples • Assess how well it did by comparing actual output and desired using a cost function (or loss function) to compute the error. • Try and reduce this error by tuning the weights and biases in the network
  21. Cost function - outputs Training a network to recognise a

    6 Desired 0 0 0 0 0 0 1 0 0 0 0
  22. Cost function - outputs Training a network to recognise a

    6 Desired Actual 0 0 0 0 0 0 1 0 0 0 0 0.3 0 0.5 0.2 0 0.1 0.3 0 0 0.9 0.8
  23. Cost function - outputs Training a network to recognise a

    6 Desired Actual |Difference| 0 0 0 0 0 0 1 0 0 0 0 0.3 0 0.5 0.2 0 0.1 0.3 0 0 0.9 0.8 0.3 0 0.5 0.2 0 0.1 0.7 0 0 0.9 0.8
  24. Cost function Training a network Desired Actual |Difference| Number of

    training inputs weights biases
  25. Cost function C(w,b)≈0 - Otherwise -

  26. How to minimise a function? C(w) = w^2

  27. C(w) = w ^ 2 dC/dw(w) = 2w 0 =

    2w w = 0
  28. How to minimise a function? C(w1,w2) w1 w2 • Two

    variables = 3D graph • 3+ variables = ??? puny human brain. But that’s fine, we can use the derivative. • Use partial differentiation to understand derivative of a function with multiple inputs
  29. None
  30. • Start at a random value for the input •

    Work out the gradient at this point (the derivative) • Determine how we should change the input to ‘descend’ down the slope depending on current gradient • Repeat until you reach a local minimum
  31. • Start at a random value for the input •

    Work out the gradient at this point (the derivative) • Determine how we should change the input to ‘descend’ down the slope depending on current gradient • Repeat until you reach a local minimum
  32. Gradient Descent + Tuning Inputs • We have our cost

    function
  33. Gradient Descent + Tuning Inputs • We have our cost

    function • We have the gradient of C for the current inputs - some shiny maths (vector of partial derivatives of C for each variable in the system).
  34. Gradient Descent + Tuning Inputs • We have our cost

    function • We have the gradient of C for the current inputs - some shiny maths (vector of partial derivatives of C for each variable in the system). • Use gradient descent to work out the change we want to make to each variable’s current value - the gradient times a variable called learning rate
  35. Gradient Descent + Tuning Inputs • We have our cost

    function • We have the gradient of C for the current inputs - some shiny maths (vector of partial derivatives of C for each variable in the system). • Use gradient descent to work out the change we want to make to each variable’s current value - the gradient times a variable called learning rate • Produces a list of changes/nudges for every weight and bias in the system
  36. None
  37. Summary: How it learns • Start with random numbers for

    all weights and biases. • Train the network with training examples • Assess how well it did and recognising the numbers using a cost function. • Minimise the cost function during gradient descent, create list of small nudges to the current values. • Update all weights + bias for all neurons in one layer, then do the same process for every neuron in the previous layer (backpropagation) • Iterate until we get a cost function output close to 0 and test accuracy!
  38. Summary: How it learns

  39. Summary* • Artificial neurons made of inputs, weights, a bias,

    and an activation function. • Emergent, automatically inferred decision making based on tying all these neurons together into layers • Magical hidden layers of neural networks that infer decision using rules humans would not • Improving performance by minimising cost function • Using some hardcore maths to work out how adjust thousands of variables to improve an algorithm - making it learn from it mistakes! *AKA what I found cool about Neural Networks and hopefully you did too
  40. Resources • Neuralnetworksanddeeplearning.com (NN for any function, improving backprop) •

    3blue1brown (4 videos on NN, plenty of other great content) • Gradient Descent + Backpropagation https://medium.com/datathings/neural-networks-and-backpropagation-explained-in-a-simple-way-f540a3611f5e • Machine Learning failures - for art! https://www.thestrangeloop.com/2018/machine-learning-failures---for-art.html • Look at other types of NN / Machine Learning (RNN, CNN, untrained models)