Together toward an AI plus ultra

AN AI  NEAT PLUS ULTRA

AN AI  NEAT PLUS ULTRA Good Afternoon everyone. I am
thrilled to be here with you on stage!

Grégoire Hébert Senior Developper — Trainer — Lecturer @ Les-Tilleuls.coop
@gheb_dev @gregoirehebert UNE IA  NEAT PLUS ULTRA Let me introduce myself :)  My name is Grégoire Hébert, I am a senior developer, lecturer and speaker at Les-Tilleuls.coop .  If you can’t pronounce it properly come at our booth, we’ll teach you :) 

@gheb_dev @gregoirehebert

@gheb_dev @gregoirehebert Machine Learning Image Recognition Language Processing Autonomous Vehicules
Medical Diagnostics Robotic Recommender Systems We are going to spend 40 minutes together, and the subject is the machine learning.  Who, in this room, never worked with artificial intelligence before? Raise your hand. For all the others, what I am about to say may sound trivial, or simplistic, but the goal here is to set a start.  Because YES. Doing research about Machine learning is not, something trivial. Machine learning is a subpart of the AI field, but is also part of the others. Without mentioning them, I showed some fields of usage, we are going to focus on things the we want to set autonomous. Now, How complex can be an AI? Does an AI has to be all powerful?  Of course not, we’ve got multiple levels of complexity.  Starting from

@gheb_dev @gregoirehebert REACTIVE MACHINES (Scenarii reactive) Reactive machine, this is
the kind of AI, we have in video games.

@gheb_dev @gregoirehebert REACTIVE MACHINES (Senarii reactive) LIMITED MEMORY (Environment reactive)
Limited Memory, where we begin to act according to time, location, extra knowledge surrounding you.  For instance, my car gps knows I am leaving the office, it’s 6p.m. it shows me every known restaurant on my way home.

THEORY OF MIND (People awareness) For the Theory of mind, it’s not just a blind guess anymore. The AI knows me ! I had a harsh day, It’ll show my favourite comforting restaurant..  . Mc Donald, KFC… and a little padthai restaurant.

THEORY OF MIND (People awareness) SELF AWARE Self aware !! Beware ! It start to rule the world !

SELF AWARE @gheb_dev @gregoirehebert REACTIVE MACHINES (Senarii reactive) LIMITED MEMORY
(Environment reactive) THEORY OF MIND (People awareness)

@gheb_dev @gregoirehebert REACTIVE MACHINES (Senarii reactive) Ok before that, we
need to grasp the subtleties of the ﬁrst level, ReactiveMachine.  And don’t mistaken it’s simplicity for a lack of capabilities. The goal now is, together, see how each one of you could leave this room and be able to write down a simple AI. And then, be drown to the abyss of machine learning, even grow a passion to it :)

@gheb_dev @gregoirehebert INPUT Alright, everything start from an input.  An
input can be a number, a set of numbers, an image, a string. Well, If I want to treat everything the same way, I’ll need to normalise that input.  For a picture of my cat, I need to transform that jpeg ﬁle into a matrix of values, each one between 0 and 255 of white 0 and 255 of red, of green and of blue. For each type of data I need to normalise it’s representation into something the system can read and exploit. The larger is the set, the better is the result.

@gheb_dev @gregoirehebert INPUT ? That input will be computed into
a value.  At the moment we don’t know how.

@gheb_dev @gregoirehebert INPUT ? OUTPUT And that value, will in
the end be computed into a result.  This result is the output we expect, the answer if you will.

@gheb_dev @gregoirehebert INPUT ? OUTPUT PERCEPTRON Well this simplest representation
is called a perceptron. Let’s put that into something representative shall we?

@gheb_dev @gregoirehebert ? Let’s say, according to my hunger, the
machine should decide if I shall eat.

@gheb_dev @gregoirehebert ? Or not Or not.

@gheb_dev @gregoirehebert ? Or not 0 - 10 To decide
I need to normalise my stomach emptiness, 0 : I am not hungry 10: I am starving

@gheb_dev @gregoirehebert ? Or not 0 - 10 0 -
1 Activation To get to the intermediate value I will use a weight.  This weight is at start a random value between 0 and 1.  This is arbitrary it could be between 1 and 10 or 100. It’s up to you.  How to chose then, it’s by experience. By running through a lot of dataset, you start having a spider sense about where the ﬁnal value could be.

1 Activation We are going then to use the result of the weight multiplied by the input   through an activation function.  An activation function will help us to control the transformation of the input value into an output value according to their behaviour.

@gheb_dev @gregoirehebert 0 - 10 ? Or not 0 -
1 0 - 1 Activation Activation

@gheb_dev @gregoirehebert What is an activation function? It’s a a
function,  A mathematical function. Well not really one function.  There are few functions that are useful as an activation function.

@gheb_dev @gregoirehebert Binary Step You’ve got the Binary step.

@gheb_dev @gregoirehebert Binary Step The easiest, according to the input
value it will return a 0, or a 1.

@gheb_dev @gregoirehebert Binary Step Gaussian The gaussian

@gheb_dev @gregoirehebert Binary Step Gaussian This one, if you do
statistics you know it well :)   It’s the same curve that represent the normal randomness distribution.

@gheb_dev @gregoirehebert Binary Step Gaussian Hyperbolic Tangent

@gheb_dev @gregoirehebert Binary Step Gaussian Hyperbolic Tangent TanH

@gheb_dev @gregoirehebert Binary Step Gaussian Hyperbolic Tangent Parametric Rectiﬁed Linear
Unit Relu

Unit Start a 0, and tends to inﬁnite

Unit Sigmoid Sigmoid

Unit Sigmoid Tends toward 0 and 1 but never touches them.

Unit Sigmoid Thresholded Rectiﬁed Linear Unit

Unit Sigmoid Thresholded Rectiﬁed Linear Unit Like relu but with a threshold.

Unit Sigmoid Thresholded Rectiﬁed Linear Unit The most common one is Sigmoid, that’s the one we are going to use today.

@gheb_dev @gregoirehebert Sigmoid

0 - 10 ? 0 - 1 0 - 1
Activation Activation

? Or not 0 - 10 0 - 1 0
- 1 Activation Activation @gheb_dev @gregoirehebert And we are going to repeat the operation twice.  A ﬁrst one to get the intermediary value,   and a second one to get the output.

1 0 - 1 Sigmoid Sigmoid Ok, now we’ve got a coefficient (a weight) to multiply our value, and an activation function to obtains a contained value within a controlled range. Most of the time this is not enough. You need to see the ﬁrst calculus as a force.  Imagine a second I am a Jedi. I want to pull a person from the audience to me.  If I were to pull upward the stage in one direction you would probably end up stuck up above the screen, the face completely smashed. I need a second force to direct the trajectory to me.

? Or not 0 - 10 0 - 1 0
- 1 Sigmoid Sigmoid @gheb_dev @gregoirehebert I need a bias to apply to the value.

? Or not 0 - 10 0 - 1 0
- 1 Sigmoid Sigmoid @gheb_dev @gregoirehebert Bias Bias This bias will be a factor, another simple addition to run. Their value, at start, are, like the weights, chosen at random, between 0 and 1.  As before it could be to 10 or 100.  Let’s change that into possible numbers for the example.

? Or not 8 0.2 0.3 Sigmoid Sigmoid @gheb_dev @gregoirehebert
0.4 0.8 In this situation I am hungry.  0.2 and 0.3 are the weight and 0.4 and 0.8 respectively the biases. Note that I did not named the value in between the input and the output. That intermediate representation.

H Or not 8 0.2 0.3 Sigmoid Sigmoid @gheb_dev @gregoirehebert
0.4 0.8 I’ll write it H for Hidden.  Each intermediate representation is called a node, and this we don’t really know in time what is the value and frankly don’t really care at the moment, we will say that the node is hidden. Now that we have established the system, let’s dive into the maths.

0 - 10 ? 0 - 1 0 - 1
Activation Activation Before going further I must confess.  I am not a math guy. I got my degree with 3/20 in mathematics.  But, as soon as I started to learn about AI, I discovered brilliant YouTube channels that gave me better ways to learn the math, and I started to realise that I am a math guy. I love them.  I just never had a proper way to learn that ﬁt me.  Anyway the math we are going to do are simple even for me.

@gheb_dev @gregoirehebert H = sigmoid (Input x weight + bias)
To get the hidden value,   We need calculate the input multiplied by the weight plus the bias.  the result will be passed into sigmoid.

@gheb_dev @gregoirehebert H = sigmoid (8 x 0.2 + 0.4)
With our example numbers this gives us sigmoid(8 x 0.2 + 0.4)

H = 0.88078707797788 The result is 0.8807… Is it good? We don’t know. We need to do every operation.

H = 0.88078707797788 O = sigmoid (H x W + B) Now the output is the sigmoid(H x W + B)

H = 0.88078707797788 O = sigmoid (H x 0.3 + 0.8) With our example value it’s sigmoid (0.8807 x 0.3 + 0.8)

H = 0.88078707797788 O = sigmoid (H x 0.3 + 0.8) O = 0.74349981350761 It gives us 0.74. Let’s agree that over 0.5 I eat, and under, I don’t.

H = 0.88078707797788 O = sigmoid (H x 0.3 + 0.8) O = 0.74349981350761 I was hungry, And I eat :D

H = 0.88078707797788 O = sigmoid (H x 0.3 + 0.8) O = 0.74349981350761 Is it good, ﬁrst run? To know, I need to run all the math with a lower input.

H = 0.6897448112761 O = sigmoid (H x 0.3 + 0.8) O = 0.73243113381927 Let’s say 2, I am not hungry.  I can see that the output result is not that different…

H = 0.6897448112761 O = sigmoid (H x 0.3 + 0.8) O = 0.73243113381927 I ate too much. I would have died like a ﬁlleted goose.  We need to ﬁx the weights and biases until the numbers are right.

H = sigmoid (2 x 0.2 + 0.4) H =
0.6897448112761 O = sigmoid (H x 0.3 + 0.8) O = 0.73243113381927 @gheb_dev @gregoirehebert TRAINING We need to train our system.

H = 0.6897448112761 O = sigmoid (H x 0.3 + 0.8) O = 0.73243113381927

H Or not 8 0.2 0.3 Sigmoid Sigmoid @gheb_dev @gregoirehebert
0.4 0.8 BACK PROPAGATION The training system I am going to show you is called back propagation. It’s purpose it to correct iteration after iteration each value of the weights and biases until the result satisfy our goal.

H Or not 8 0.2 0.3 Sigmoid Sigmoid @gheb_dev 0.4
0.8 BACK PROPAGATION

0.8 BACK PROPAGATION The way it works is by applying an operation, on each value from the output to the input by measuring the difference between the expected result and the one we obtained.

0.8 BACK PROPAGATION LINEAR GRADIENT DESCENT Remember when I pictured the force used to pull someone here, and not up there?  The same principle applies here. We need to change the values but the coefficient must not be too high nor too little. We need to use a math principle called linear gradient descent.  Ok I have the feeling that we should go through the maths, because so far it’s just words. 

@gheb_dev @gregoirehebert BACK PROPAGATION LINEAR GRADIENT DESCENT ERROR ok, we
need to get the error. So the difference between the result and what’s expected.

@gheb_dev @gregoirehebert BACK PROPAGATION LINEAR GRADIENT DESCENT It can be
for a single point, or the median to a dataset.

@gheb_dev @gregoirehebert BACK PROPAGATION LINEAR GRADIENT DESCENT ERROR EXPECTATION -
OUTPUT = Grab and subtract the two values. You’ve got the error.  If I were to apply the difference directly into each value, the result might not be the one we expect.

OUTPUT = Let’s say we are in a world where friction does not exists.  If I continuously apply the same force through time to the train…well

OUTPUT =

OUTPUT = Pfjrouu ! Yay roller coaster tycoon !

OUTPUT = We need to adjust a few things to get it working.

@gheb_dev @gregoirehebert LINEAR GRADIENT DESCENT Imagine, we replace the rail
track by this function representation.  It’s quite similar. It’s a curve.  My goal is to ﬁnd the lowest position on the curve.

@gheb_dev @gregoirehebert LINEAR GRADIENT DESCENT My starting point is arbitrary.
  As a human with two properly working eyes, I can easily eyeball that I need to reduce the value.  I am too far.  But on a computer perspective how do I know that?

@gheb_dev @gregoirehebert LINEAR GRADIENT DESCENT I need to ﬁnd the
slope. The slope will help us to ﬁnd whether the next value increase or decrease, same for the previous value. So I can apply the right operation, should I go forth or go back.

@gheb_dev @gregoirehebert LINEAR GRADIENT DESCENT Thanks to the slope I
know I can go back.  In math to get the function’s slope, we need to use what’s called its derivative.

@gheb_dev @gregoirehebert LINEAR GRADIENT DESCENT The derivative or Slope   
For any function f, it’s derivative f’  calculate the direction    S >= 0 then you must increase the value  S <= 0 then you must decrease the value The result of a derivative gives us the direction.  If it’s over 0 we must increase the value,   under 0, we must decrease the value. Simple.

OUTPUT = Let’s come back on the formula.

OUTPUT = Sigmoid’ (OUTPUT) We apply the derivative of sigmoid, with the output as entry value.  We’ve got a result which is?  The slope, thank you to follow!

OUTPUT = Sigmoid’ (OUTPUT) Multiplied by the error We multiply this by the error.

OUTPUT = Sigmoid’ (OUTPUT) Multiplied by the error And a LEARNING RATE And a learning rate.  What is learning rate? 

@gheb_dev @gregoirehebert LINEAR GRADIENT DESCENT This is what I want.

@gheb_dev @gregoirehebert LINEAR GRADIENT DESCENT Going the closest to the
red dot in the less attempts possible.

@gheb_dev @gregoirehebert LINEAR GRADIENT DESCENT But according to the error,
I have a chance to miss my point.  Greater the error, bigger are the chances.

@gheb_dev @gregoirehebert LINEAR GRADIENT DESCENT And I could swing around
for a long moment.

@gheb_dev @gregoirehebert LINEAR GRADIENT DESCENT

OUTPUT = Sigmoid’ (OUTPUT) Multiplied by the error And the LEARNING RATE I need a learning rate to temper this.  Remember when I want to pull someone to stage,   I can prevent him to ﬂy away to the sky, but if it is to break it’s spine to the ﬂoor…

OUTPUT = GRADIENT Sigmoid’ (OUTPUT) = Multiplied by the error And the LEARNING RATE We call the result of this formula the gradient. And this, is the coefficient we want to apply to our different weight and biases.  But I must warn you. From the start we chose everything at random, expecting the values to be fixed through iterations. But the learning rate, you need to adjust it’s value. If it’s too high, you might never reach your goal by always passing by it but never stop close enough. And if it’s too small, You’ve got two problems coming, first one you will need way too many iterations to find the minimum value in the curb. The second one is you can get trapped in a valley. Let me show you this:

@gheb_dev @gregoirehebert BACK PROPAGATION LINEAR GRADIENT DESCENT Beware of the
local minimum If I am on the right part of the track, I’ll ﬁnd a local minimum.  But I can see that I am not it the lowest part of the track.  This is when the human is useful. You know or have the feeling that it’s not right.  In addition to say, here is the objective, you can adjust the learning rate so you can go over hills.  This is where it can be complex and you might need to use more advanced systems.

OUTPUT = GRADIENT Sigmoid’ (OUTPUT) = Multiplied by the error And the LEARNING RATE ΔWeights GRADIENT x H = Alright, by multiplying the gradient to the hidden value, we’ve got the delta for the weights.

0.8 BACK PROPAGATION LINEAR GRADIENT DESCENT Remember pour little scheme ? We are going to go back for every single one of the four values.

OUTPUT = GRADIENT Sigmoid’ (OUTPUT) = Multiplied by the error And the LEARNING RATE ΔWeights GRADIENT x H = Weights ΔWeights + weights = The new weights values are now the delta (which might be negative) in addition to the previous weights values.

OUTPUT = GRADIENT Sigmoid’ (OUTPUT) = Multiplied by the error And the LEARNING RATE ΔWeights GRADIENT x H = Weights ΔWeights + weights = Bias Bias + GRADIENT = For the bias we don’t need that much of a difference, just adding the gradient do the trick.

0.8 BACK PROPAGATION LINEAR GRADIENT DESCENT Let’s see what it should look like after a few iterations. We started from there and we end up with

H Or not 8 4.80 7.66 Sigmoid Sigmoid @gheb_dev -26.61
-3.75 BACK PROPAGATION LINEAR GRADIENT DESCENT These results.

H 8 4.80 7.66 Sigmoid Sigmoid @gheb_dev -26.61 -3.75 BACK
PROPAGATION LINEAR GRADIENT DESCENT 0.97988 For this combination and a 8 over 10 hunger feeling, I got 0.97988 value.

H 4.80 7.66 Sigmoid Sigmoid @gheb_dev -26.61 -3.75 BACK PROPAGATION
LINEAR GRADIENT DESCENT 2 0.02295 And for a 2 over 10 hunger feeling, I’ve got a way more suitable number.

H 4.80 7.66 Sigmoid Sigmoid @gheb_dev -26.61 -3.75 BACK PROPAGATION
LINEAR GRADIENT DESCENT 2 0.02295

@gheb_dev @gregoirehebert CONGRATULATIONS ! Congratulation ! You are already capable
of creating a small yet powerful machine learning system.  Who feels that building this kind of system is at reach now? Raise your hand.  Alright for the others, let me show you, that you under-estimate yourselves.

CONGRATULATIONS ! Let’s play together :) https://github.com/GregoireHebert/nn/ @gheb_dev @gregoirehebert You
can grab the code and play with it at home. This is simple PHP, only one dependency to manipulate matrices of values, that’s it. We are in a symfony conference so I could not resist I made a small toy with this   and with 2 or 3 components from symfony I created a tamagotchi. An autonomous sheep :D  It comes in an example branch, I installed a raspberry with an lcd screen, and asked a friend to print a box for me, that you have probably already seen on the booth.

CONGRATULATIONS ! Let’s play together :) @gheb_dev @gregoirehebert Now, that
you have touched the most minimalistic system with the tip of the ﬁnger.  What can we do to improve it and do things more complex?

@gheb_dev @gregoirehebert Hungry EAT Well with a perceptron I can
do that.

@gheb_dev @gregoirehebert Hungry EAT But I can make it a
little more complex, and have multiple hidden nodes each one with its own weights and biases.   Maybe for each node you can use a different activation function.

@gheb_dev @gregoirehebert Hungry EAT MULTI LAYER PERCEPTRON Hungry EAT Hungry
EAT We can multiply this horizontally and vertically into a multi layer perceptron

@gheb_dev @gregoirehebert Hungry EAT MULTI LAYER PERCEPTRON Thirsty DRINK Sleepy
SLEEP

SLEEP Where each input has an impact on the others.  Ok as far as we are now, we always have the control over the number of nodes and layers. What if….

SLEEP We don’t.

SLEEP What if, any decision his made randomly? Number of layers, nodes, weights, biases, their values.  And to check over different configurations which one is the best,     Imagine we create hundreds of configurations randomly, put them all into competition and only keep the top 10. Then from the top ten we create a new set of 90 configurations with some slight mutations. And we go over and over. Until the results is satisfying.

@gheb_dev @gregoirehebert Hungry EAT N.E.A.T. Thirsty DRINK Sleepy SLEEP Neuro
Evolution through Augmented Topology This is called NEAT. Neuro evolution through augmented topology.  This is really exciting !

@gheb_dev @gregoirehebert Hungry EAT N.E.A.T. Thirsty DRINK Sleepy SLEEP Neuro
Evolution through Augmented Topology https://github.com/GregoireHebert/tamagotchi I’ve started to play with it, you can ﬁnd examples by looking for MarI/O  that is an implementation of this algorithm in LUA.

@gheb_dev @gregoirehebert Going Further Data Normalization / Preparation    Multiple
Activation functions    Mutations    Unsupervised Learning  So to summarise to go further,   You can dig into data normalisation and preparation,   multiple activation functions, applying mutations, and ﬁnish into the awesome vortex of unsupervised learning.

@gheb_dev @gregoirehebert I don’t leave you without sources, Here are
some youtube channels you can follow  

@gheb_dev @gregoirehebert 3blue1Brown which is an absolute goldmine for the
math explanation of theorems and how neural networks works.  The coding train is a teacher which has a tremendous amount of coding videos about machine learning. Even if in his case it’s more about drawings recognition and approximation, he has a full serie about the fundamentals. Computerphile is a concentrate of gold nuggets !

@gheb_dev @gregoirehebert Something More PHP related, In PHP there is
a huge library which is PHP-ML.  And since we have access to foreign function interfaces (FFI) in php, you can now use TensorFlow in PHP. It’s experimental but.

@gheb_dev @gregoirehebert There are two repositories, php-ffi and php-tensorﬂow.

@gheb_dev @gregoirehebert THANK YOU! Thank you so much that’s it
for me :) I just have an ultimate question for you!

@gheb_dev @gregoirehebert THANK YOU! How many sheeps did you see
? How many sheeps did you see during the presentation ? :)

Together toward an AI plus ultra

Together toward an AI plus ultra

More Decks by Grégoire Hébert

Other Decks in Programming

Featured

Transcript