Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Python AI - All you need to know about Machine Learning and Deep Learning

Python AI - All you need to know about Machine Learning and Deep Learning

Alejandro Saucedo (Founder @ Exponential Technologies) and Donald Whyte (Software Engineer @ Engineers Gate) @ Moscow Python Conf 2017
"There is a lot of hype about deep learning and everything AI, however behind all the noise there is a set of solid concepts and algorithms that have massive potential if used in the right way and with the right data. In this talk we will provide you with the key concepts you will need to build a solid understanding around the core of Machine Learning. We will also cover key Deep Learning concepts and examples using Tensorflow that will help you understand the real potential of Deep Learning in practical applications. This talk will provide a theoretical overview that will then be put in practice in the Deep Learning workshop".
Video: https://conf.python.ru/python-ai-all-you-need-know-about-machine-learning-and-deep-learning/

Moscow Python Meetup

October 20, 2017
Tweet

More Decks by Moscow Python Meetup

Other Decks in Programming

Transcript

  1. DEEP LEARNING WITH RECURRENT NEURAL NETWORKS IN PYTHON / /

    Donald Whyte @donald_whyte Alejandro Saucedo @AxSaucedo
  2. CREATE AN AI AUTHOR. Create a neural network that can

    write novels. Using 34,000 English novels to train the network.
  3. THE OUTPUT Gradually drawing away from the rest, two combatants

    are striving; each devoting every nerve, every energy, to the overthrow of the other. But each attack is met by counter attack, each terrible swinging stroke by the crash of equally hard pain or the dull slap of tough hard shield opposed in parry. More men are down. Even numbers of men on each side, these two combatants strive on.
  4. Less than 100 lines of Tensorflow code! # ONE import

    tensorflow as tf from tensorflow.contrib import layers, rnn import os import time import math import numpy as np tf.set<em>random_seed(0) # model parameters SEQLEN = 30 BATCHSIZE = 200 ALPHASIZE = 89 INTERNALSIZE = 512 NLAYERS = 3 learning_rate = 0.001 dropout_pkeep = 0.8 codetext, valitext, bookranges = load_data() # the model lr = tf.placeholder(tf.float32, name='lr') # learning rate pkeep = tf.placeholder(tf.float32, name='pkeep') # dropout parameter batchsize = tf.placeholder(tf.int32, name='batchsize') # inputs X = tf.placeholder(tf.uint8, [None, None], name='X') Xo = tf.one_hot(X, ALPHASIZE, 1.0, 0.0) # expected outputs Y</em> = tf.placeholder(tf.uint8, [None, None], name='Y<em>') Yo</em> = tf.one<em>hot(Y</em>, ALPHASIZE, 1.0, 0.0) # input state Hin = tf.placeholder(tf.float32, [None, INTERNALSIZE*NLAYERS], name='Hin') # hidden layers cells = [rnn.GRUCell(INTERNALSIZE) for _ in range(NLAYERS)] multicell = rnn.MultiRNNCell(cells, state_is_tuple=False) # TWO Yr, H = tf.nn.dynamic<em>rnn(multicell, Xo, dtype=tf.float32, initial_state=Hin) H = tf.identity(H, name='H') # Softmax layer implementation Yflat = tf.reshape(Yr, [­1, INTERNALSIZE]) Ylogits = layers.linear(Yflat, ALPHASIZE) Yflat</em> = tf.reshape(Yo<em>, [­1, ALPHASIZE]) loss = tf.nn.softmax_cross_entropy_with_logits(logits=Ylogits, labels=Yflat</em>) loss = tf.reshape(loss, [batchsize, ­1]) Yo = tf.nn.softmax(Ylogits, name='Yo') Y = tf.argmax(Yo, 1) Y = tf.reshape(Y, [batchsize, ­1], name="Y") train<em>step = tf.train.AdamOptimizer(lr).minimize(loss) # Init for saving models if not os.path.exists("checkpoints"): os.mkdir("checkpoints") saver = tf.train.Saver(max_to_keep=1000) # init istate = np.zeros([BATCHSIZE, INTERNALSIZE*NLAYERS]) init = tf.global_variables_initializer() sess = tf.Session() sess.run(init) step = 0 # train on one minibatch at a time for x, y</em>, epoch in txt.rnn<em>minibatch_sequencer(codetext, BATCHSIZE, SEQLEN, nb_ep feed_dict = {X: x, Ye: ye, Hin: istate, lr: learning_rate, pkeep: dropout_pkeep, batc </em>, y, ostate = sess.run([train<em>step, Y, H], feed_dict=feed_dict) if step // 10 % _50_BATCHES == 0: saved_file = saver.save(sess, 'checkpoints/rnn_train</em>' + timestamp, global_st print("Saved file: " + saved_file) istate = ostate step += BATCHSIZE * SEQLEN
  5. ['h','e','l','l','o',' ', 'm','y',' ','n','a','m','e',' ','i','s', ...] [10, 5, 12, 12,

    17, 27, 15, 25, 27, 16, 1, 15, 5, 27, 6, 18, ...] merge all training documents into one load as a flat list of chars convert chars to integers flat sequence of integers that represent all text in dataset
  6. FEATURE SPACE A feature is some property that describes raw

    input data Features represented as a vector in feature space Abstract complexity of raw input for easier processing
  7. Training data is used to produce a model Model divides

    feature space into segments Each segment corresponds to one output class CLASSIFICATION f (x̄ ) = mx̄ + c
  8. means le side of the line. Input shape is a

    triangle. x̄ = (3, 1) f (x̄ ) = 1(3) + −3.5(1) + 0 f (x̄ ) = 3 − 3.5 + 0 f (x̄ ) = −0.5 −0.5 < 0 −0.5 < 0
  9. No.

  10. GENERATING COHERENT TEXT REQUIRES MEMORY OF WHAT WAS WRITTEN PREVIOUSLY.

    Male Person Valentin, he Drinks drink, beer, lagers Valentin's favourite drink is beer. He likes lagers the most.
  11. Deep neural nets can learn to patterns in complex data,

    like language. We can encode memory into the algorithm.
  12. Just use the raw input data. Our training data is

    the raw text of existing novels. No need for for manual feature extraction.
  13. THE MIGHTY PERCEPTRON Equivalent to the straight line equation from

    before Linearly splits feature space Modeled a er a neuron in the human brain
  14. THE MIGHTY PERCEPTRON Synonymous to our linear function f(x) =

    mx + c For n features, the perceptron is defined as: Y = F(WX + B) n-dimensional weight vector w n-dimensional input vector x bias scalar b activation function f(s) output y
  15. ACTIVATION FUNCTION Simulates the 'firing' of a physical neuron. Takes

    the weighted sum and squashes it into a smaller range.
  16. PERCEPTRON LEARNING ALGORITHM Algorithm which learns correct weights and bias

    Use training dataset to incrementally train perceptron Guaranteed to create line that divides output classes (if data is linearly separable)
  17. REPRESENTING TEXT Make the input layer represent: a single word

    or a single character Use the input to word/char to predict the next.
  18. WE WILL USE CHARACTERS AS THE INPUTS. 21 22 23

    24 25 current char next char 21 22 23 24 25
  19. 21 22 23 24 25 21 22 23 24 25

    Input: b Predicted char: ? Current sentence: b?
  20. 21 22 23 24 25 21 22 23 24 25

    Input: b Predicted char: a Current sentence: ba
  21. 21 22 23 24 25 21 22 23 24 25

    Input: a Predicted char: d Current sentence: bad
  22. ...

  23. 21 22 23 24 25 21 22 23 24 25

    Input: e Predicted char: d Current sentence: ball games were o en played
  24. PROBLEM Single perceptrons are straight line equations. Produce a single

    output, and hence cannot be used for complex problems like language. Need a network of neurons to output the full one-hot vector.
  25. SOLUTION: NEURAL NETWORKS Uses multi-layer perceptrons to: learn patterns in

    complex data, like language produce the multiple outputs required for text prediction Has multiple layers to provide flexibility on learning
  26. NEURON CONNECTIVITY Each layer is fully connected to the next

    All nodes in layer are connected to nodes in layer Every single connection has a weight l l + 1
  27. LOSS FUNCTION AN OPTIMIZATION PROBLEM Inputs: 1. the real output

    of the network a er each batch 2. the expected output (from our training data) Outputs: Number indicating performance of network.
  28. GRADIENT DESCENT OPTIMISER We optimise the network by minimising its

    loss. Keep adjusting the weights of each hidden layer... ...until loss is not getting any smaller.
  29. BACKPROPAGATION Equivalent to gradient descent The training algorithm for neural

    networks For each feature vector in the training dataset, do a: 1. forward pass 2. backward pass
  30. A er training the network, we obtain weights which minimise

    prediction error. Predict next character by running the last character through the forward pass step.
  31. HOWEVER... Network still has no memory of past characters. Valentin's

    favourite drink is beer. He likes lagers the most.
  32. RECURRENT NETWORKS O_ 0 (layer 0 output) O_ 1 (layer

    1 output) O_ 2 (layer 2 output) Hidden layer's input includes the output of itself during the last run of the network.
  33. Time Time B o b _ o b i _

    t=0 t=1 t=2 t=3 t=4 t=5
  34. Time Time B o b i _ o b i

    s _ t=0 t=1 t=2 t=3 t=4 t=5
  35. Time Time B o b i s _ o b

    i s _ _ t=0 t=1 t=2 t=3 t=4 t=5
  36. Time Time B o b i s _ o b

    i s _ _ t=0 t=1 t=2 t=3 t=4 t=5
  37. PROBLEM: LONG-TERM DEPENDENCIES Time B o b i s _

    _ a _ a n m o b i s _ _ a _ a n m ... ✖
  38. CELL STATES Add extra state to each layer of the

    network. Remembers inputs far into the past. Transforms layer's original output into something that is relevant to the current context.
  39. O _0 O_ 1 O _2 H_0 (cell state) H_2

    (cell state) H_1 (cell state)
  40. Hidden layer output and cell state is feed into next

    time step. Gives network ability to handle long-term dependencies in sequences.
  41. Time B o b i s _ _ a _

    a n m o b i s _ _ a _ a n m ... ✓
  42. ['h','e','l','l','o',' ', 'm','y',' ','n','a','m','e',' ','i','s', ...] [10, 5, 12, 12,

    17, 27, 15, 25, 27, 16, 1, 15, 5, 27, 6, 18, ...] merge all training documents into one load as a flat list of chars convert chars to integers use integers to generate one-hot inputs for each time step
  43. COMMON TRAINING METHODS Run backpropagation a er: Stochastic one sequence

    Batch all sequences Mini-Batch smaller batch of sequences b
  44. WHY? Stochastic long time to converge on good weights Batch

    consumes lots of memory, gets stuck on "okay" weights Mini- Batch quick to converge and memory efficient
  45. Iterate across all batches. Run backpropagation a er processing each

    batch. [10, 5, 12, 12, 17, 27, 15, 25, 27, 16, 1, 15, 5, 27, 6, 18, ...] [27, 16, 1, 15] [10, 5, 12, 12] [5, 27, 6, 18] [17, 27, 15, 25] ...
  46. Building a neural network involves: 1. defining its architecture 2.

    learning the weight matrices for that architecture
  47. SOLUTION Can build very complex networks quickly Easy to extend

    if required Built-in support for RNN memory cells
  48. current char predicted next char Input Output Hidden Recurrent Layers

    BUILD A RECURRENT NEURAL NETWORK TO GENERATE STORIES IN TENSORFLOW.
  49. THE COMPUTATION GRAPH tf.Tensor Unit of data. Vectors or matrices

    of values (floats, ints, etc.). tf.Operation Unit of computation. Takes 0+ tf.Tensors as inputs and outputs 0+ tf.Tensors. tf.Graph Collection of connected tf.Tensors and tf.Operations. Operations are nodes and tensors are edges.
  50. GRAPH THAT TRIPLES NUMBERS AND SUMS THEM. Output # 1.

    Define Inputs # Input is a 2D vector containing the two numbers to triple. inputs = tf.placeholder(tf.float32, [2]) # 2. Define Internal Operations tripled_numbers = tf.scalar_mul(3, inputs) # 3. Define Final Output # Sum the previously tripled inputs. output_sum = tf.reduce_sum(tripled_numbers) # 4. Run the graph with some inputs to produce the output. session = tf.Session() result = session.run(output_sum, feed_dict={inputs: [300, 10]}) print(result) 930
  51. DEFINING HYPERPARAMETERS current char predicted next char Input Output Hidden

    Recurrent Layers # Input Hyperparameters SEQUENCE_LEN = 30 BATCH_SIZE = 200 ALPHABET_SIZE = 98 # Hidden Recurrent Layer Hyperparameters HIDDEN_LAYER_SIZE = 512 NUM_HIDDEN_LAYERS = 3
  52. current char predicted next char Input # Dimensions: [ BATCH_SIZE,

    SEQUENCE_LEN ] X = tf.placeholder(tf.uint8, [None, None], name='X')
  53. current char predicted next char Input # Dimensions: [ BATCH_SIZE,

    SEQUENCE_LEN, ALPHABET_SIZE ] Xo = tf.one_hot(X, ALPHABET_SIZE, 1.0, 0.0)
  54. DEFINING HIDDEN STATE from tensorflow.contrib import rnn # Cell State

    # [ BATCH_SIZE, HIDDEN_LAYER_SIZE * NUM_HIDDEN_LAYERS] H_in = tf.placeholder( tf.float32, [None, HIDDEN_LAYER_SIZE * NUM_HIDDEN_LAYERS], name='H_in') # Create desired number of hidden layers that use the `GRUCell` # for managing hidden state. cells = [ rnn.GRUCell(HIDDEN_LAYER_SIZE) for _ in range(NUM_HIDDEN_LAYERS) ] multicell = rnn.MultiRNNCell(cells)
  55. UNROLLING RECURRENT NETWORK LAYERS Wrap recurrent hidden layers in tf.dynamic_rnn.

    Unrolls loops when computation graph is running. Loops will be unrolled SEQUENCE_LENGTH times. Yr, H_out = tf.nn.dynamic_rnn( multicell, Xo, dtype=tf.float32, initial_state=H_in) # Yr = output of network. probability distribution of # next character. # H_out = the altered hidden cell state after processing # last input.
  56. OUTPUT IS PROBABILITY DISTRIBUTION current char predicted next char Input

    Output Hidden Recurrent Layers from tensorflow.contrib import layers # [ BATCH_SIZE x SEQUENCE_LEN, HIDDEN_LAYER_SIZE ] Yflat = tf.reshape(Yr, [­1, HIDDEN_LAYER_SIZE]) # [ BATCH_SIZE x SEQUENCE_LEN, ALPHABET_SIZE ] Ylogits = layers.linear(Yflat, ALPHABET_SIZE) # [ BATCH_SIZE x SEQUENCE_LEN, ALPHABET_SIZE ] Yo = tf.nn.softmax(Ylogits, name='Yo')
  57. PICK MOST PROBABLE CHARACTER current char predicted next char Input

    Output Hidden Recurrent Layers # [ BATCH_SIZE * SEQUENCE_LEN ] Y = tf.argmax(Yo, 1) # [ BATCH_SIZE, SEQUENCE_LEN ] Y = tf.reshape(Y, [BATCH_SIZE, ­1], name="Y")
  58. LOSS FUNCTION Needs: 1. the real output of the network

    a er each batch 2. the expected output (from our training data)
  59. LOSS FUNCTION current char predicted next char loss expected next

    char Input Output Accuracy/Loss Calculation Hidden Recurrent Layers
  60. LOSS FUNCTION Input expected next chars into network: # [

    BATCH_SIZE, SEQUENCE_LEN ] Y_ = tf.placeholder(tf.uint8, [None, None], name='Y_') # [ BATCH_SIZE, SEQUENCE_LEN, ALPHABET_SIZE ] Yo_ = tf.one_hot(Y_, ALPHABET_SIZE, 1.0, 0.0) # [ BATCH_SIZE x SEQUENCE_LEN, ALPHABET_SIZE ] Yflat_ = tf.reshape(Yo_, [­1, ALPHABET_SIZE])
  61. LOSS FUNCTION Defining the loss function: # [ BATCH_SIZE *

    SEQUENCE_LEN ] loss = tf.nn.softmax_cross_entropy_with_logits( logits=Ylogits, labels=Yflat_) # [ BATCH_SIZE, SEQUENCE_LEN ] loss = tf.reshape(loss, [BATCH_SIZE, ­1])
  62. CHOOSE AN OPTIMISER Will adjust network weights to minimise the

    loss. In the workshop we'll use a flavour called AdamOptimizer. train_step = tf.train.GradientDescentOptimizer(lr).minimize(loss)
  63. EPOCHS We run mini-batch training on the network. Train network

    on all batches multiple times. Each run across all batches is an epoch. More epochs = better weights = better accuracy.
  64. MINIBATCH SPLITTING ACROSS EPOCHS # Contains: [Training Data, Test Data,

    Epoch Number] Batch = Tuple[np.matrix, np.matrix, int] def rnn_minibatch_generator( data: List[int], batch_size: int, sequence_length: int, num_epochs: int) ­> Generator[Batch, None, None]: for epoch in range(num_epochs): for batch in range(num_batches): # split data into batches, where each batch contains `b` # of length `sequence_length`. training_data = ... test_data = ... yield training_data, c, epoch
  65. START TRAINING Load dataset and construct mini-batch generator: # Initialize

    the hidden cell states to 0 before running any steps. input_state = np.zeros( [BATCH_SIZE, HIDDEN_LAYER_SIZE * NUM_HIDDEN_LAYERS]) # Create the session and initialize its variables to 0. init = tf.global_variables_initializer() session = tf.Session() session.run(init) char_integer_list = [] generator = rnn_minibatch_generator( char_integer_list, BATCH_SIZE, SEQUENCE_LENGTH, num_epochs=10)
  66. Run training step on all mini-batches for multiple epochs: #

    Initialise input state step = 0 input_state = np.zeros([ BATCH_SIZE, HIDDEN_LAYER_SIZE * NUM_HIDDEN_LAYERS ]) # Run training step loop for batch_input, expected_batch_output, epoch in generator: graph_inputs = { X: batch_input, Y_: expected_batch_output, Hin: input_state, batch_size: BATCH_SIZE } _, output, output_state = session.run( [train_step, Y, H], feed_dict=graph_inputs) # Loop state around for next recurrent run input_state = output_state step += BATCH_SIZE * SEQUENCE_LENGTH
  67. Epoch 0.0 Dy8v:SH3U 2d4 xZ Vaf%hO kS0i6 7y U5SUu6nSsR0 x

    MYiZ5ykLOtG3Q,cu St k V ctc_N CQFSbF%]q3ZsWWK8wP gyfYt3DpFo yhZ_ss,"IedX%lj,R%_4ux IX5 R%N3wQNG PnSl 1DJqLdpc[kLeSYMoE]kf xCe29 J[r_k 6BiUs GUguW Y [Kw8"P Sg" e[2OCL%G mad6,:J[A k 5 jz 46iyQLuuT 9qTn GjT6:dSjv6RXMyjxX8:3 h cr sYBgnc8 DP04A8laW
  68. Epoch 0.1 Uum awetuarteeuF toBdU iwObaaMlr o rM OufNJetu iida

    cZeDbRuZfU m igdaao QH NBJ diace e L cjoXeu ZDjiM AeN g iu O Aoc jdjrmIuaai ie t qmuozPwaEkoihca eXuzRCgZ iW AeqapiwaT VInBosPkqroi s yWbJoj yKq oUo jebaYigEouzxVb eyt Px hiamIf vPOiiPu ky Cut LviPoej iE w hpFVxes h zwsvoidmoWxzgTnL ujDt Pr a
  69. Epoch 1 Here is the goal of my further. I

    shouldn't be the shash of no. Sky is bright and blue as running goeg on. Paur decided to move downwards to the floor, where the treasure was stored. She then thought to call her friend from ahead.
  70. ...

  71. Epoch 50 Gradually drawing away from the rest, two combatants

    are striving; each devoting every nerve, every energy, to the overthrow of the other. But each attack is met by counter attack, each terrible swinging stroke by the crash of equally hard pain or the dull slap of tough hard shield opposed in parry. More men are down. Even numbers of men on each side, these two combatants strive on.
  72. /* * Increment the size file of the new incorrect

    UI_FILTER group information * of the size generatively. */ static int indicate_policy(void) { int error; if (fd == MARN_EPT) { /* The kernel blank will coeld it to userspace. */ if (ss­>segment < mem_total) unblock_graph_and_set_blocked(); else ret = 1; goto bail; } segaddr = in_SB(in.addr); selector = seg / 16; setup_works = true; for (i = 0; i < blocks; i++) { seq = buf[i++]; bpf = bd­>bd.next + i * search; if (fd) { current = blocked; } } rw­>name = "Getjbbregs"; bprm_self_clearl(&iv­>version); regs­>new = blocks[(BPF_STATS << info­>historidac)] | PFMR_CLOBATHINC_SECONDS << 12; return segtable; }
  73. We have created an AI author! Less than 100 lines

    of Tensorflow code! # ONE import tensorflow as tf from tensorflow.contrib import layers, rnn import os import time import math import numpy as np tf.set<em>random_seed(0) # model parameters SEQLEN = 30 BATCHSIZE = 200 ALPHASIZE = 89 INTERNALSIZE = 512 NLAYERS = 3 learning_rate = 0.001 dropout_pkeep = 0.8 codetext, valitext, bookranges = load_data() # the model lr = tf.placeholder(tf.float32, name='lr') # learning rate pkeep = tf.placeholder(tf.float32, name='pkeep') # dropout parameter batchsize = tf.placeholder(tf.int32, name='batchsize') # inputs X = tf.placeholder(tf.uint8, [None, None], name='X') Xo = tf.one_hot(X, ALPHASIZE, 1.0, 0.0) # expected outputs Y</em> = tf.placeholder(tf.uint8, [None, None], name='Y<em>') Yo</em> = tf.one<em>hot(Y</em>, ALPHASIZE, 1.0, 0.0) # input state Hin = tf.placeholder(tf.float32, [None, INTERNALSIZE*NLAYERS], name='Hin') # hidden layers cells = [rnn.GRUCell(INTERNALSIZE) for _ in range(NLAYERS)] multicell = rnn.MultiRNNCell(cells, state_is_tuple=False) # TWO Yr, H = tf.nn.dynamic<em>rnn(multicell, Xo, dtype=tf.float32, initial_state=Hin) H = tf.identity(H, name='H') # Softmax layer implementation Yflat = tf.reshape(Yr, [­1, INTERNALSIZE]) Ylogits = layers.linear(Yflat, ALPHASIZE) Yflat</em> = tf.reshape(Yo<em>, [­1, ALPHASIZE]) loss = tf.nn.softmax_cross_entropy_with_logits(logits=Ylogits, labels=Yflat</em>) loss = tf.reshape(loss, [batchsize, ­1]) Yo = tf.nn.softmax(Ylogits, name='Yo') Y = tf.argmax(Yo, 1) Y = tf.reshape(Y, [batchsize, ­1], name="Y") train<em>step = tf.train.AdamOptimizer(lr).minimize(loss) # Init for saving models if not os.path.exists("checkpoints"): os.mkdir("checkpoints") saver = tf.train.Saver(max_to_keep=1000) # init istate = np.zeros([BATCHSIZE, INTERNALSIZE*NLAYERS]) init = tf.global_variables_initializer() sess = tf.Session() sess.run(init) step = 0 # train on one minibatch at a time for x, y</em>, epoch in txt.rnn<em>minibatch_sequencer(codetext, BATCHSIZE, SEQLEN, nb_ep feed_dict = {X: x, Ye: ye, Hin: istate, lr: learning_rate, pkeep: dropout_pkeep, batc </em>, y, ostate = sess.run([train<em>step, Y, H], feed_dict=feed_dict) if step // 10 % _50_BATCHES == 0: saved_file = saver.save(sess, 'checkpoints/rnn_train</em>' + timestamp, global_st print("Saved file: " + saved_file) istate = ostate step += BATCHSIZE * SEQLEN