Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Python AI - All you need to know about Machine Learning and Deep Learning

Python AI - All you need to know about Machine Learning and Deep Learning

Alejandro Saucedo (Founder @ Exponential Technologies) and Donald Whyte (Software Engineer @ Engineers Gate) @ Moscow Python Conf 2017
"There is a lot of hype about deep learning and everything AI, however behind all the noise there is a set of solid concepts and algorithms that have massive potential if used in the right way and with the right data. In this talk we will provide you with the key concepts you will need to build a solid understanding around the core of Machine Learning. We will also cover key Deep Learning concepts and examples using Tensorflow that will help you understand the real potential of Deep Learning in practical applications. This talk will provide a theoretical overview that will then be put in practice in the Deep Learning workshop".
Video: https://conf.python.ru/python-ai-all-you-need-know-about-machine-learning-and-deep-learning/

Moscow Python Meetup
PRO

October 20, 2017
Tweet

More Decks by Moscow Python Meetup

Other Decks in Programming

Transcript

  1. DEEP LEARNING WITH RECURRENT
    NEURAL NETWORKS
    IN PYTHON
    /
    /
    Donald Whyte @donald_whyte
    Alejandro Saucedo @AxSaucedo

    View Slide

  2. ABOUT US
    Alejandro Saucedo Donald Whyte

    View Slide

  3. CREATE AN AI AUTHOR.
    Create a neural network that can write novels.
    Using 34,000 English novels to train the network.

    View Slide

  4. THE OUTPUT
    Gradually drawing away from the rest, two
    combatants are striving; each devoting every nerve,
    every energy, to the overthrow of the other.
    But each attack is met by counter attack, each
    terrible swinging stroke by the crash of equally hard
    pain or the dull slap of tough hard shield opposed
    in parry.
    More men are down. Even numbers of men on each
    side, these two combatants strive on.

    View Slide

  5. Less than 100 lines of Tensorflow code!
    # ONE
    import tensorflow as tf
    from tensorflow.contrib import layers, rnn
    import os
    import time
    import math
    import numpy as np
    tf.setrandom_seed(0)
    # model parameters
    SEQLEN = 30
    BATCHSIZE = 200
    ALPHASIZE = 89
    INTERNALSIZE = 512
    NLAYERS = 3
    learning_rate = 0.001
    dropout_pkeep = 0.8
    codetext, valitext, bookranges = load_data()
    # the model
    lr = tf.placeholder(tf.float32, name='lr') # learning rate
    pkeep = tf.placeholder(tf.float32, name='pkeep') # dropout parameter
    batchsize = tf.placeholder(tf.int32, name='batchsize')
    # inputs
    X = tf.placeholder(tf.uint8, [None, None], name='X')
    Xo = tf.one_hot(X, ALPHASIZE, 1.0, 0.0)
    # expected outputs
    Y = tf.placeholder(tf.uint8, [None, None], name='Y')
    Yo = tf.onehot(Y, ALPHASIZE, 1.0, 0.0)
    # input state
    Hin = tf.placeholder(tf.float32, [None, INTERNALSIZE*NLAYERS], name='Hin')
    # hidden layers
    cells = [rnn.GRUCell(INTERNALSIZE) for _ in range(NLAYERS)]
    multicell = rnn.MultiRNNCell(cells, state_is_tuple=False)
    # TWO
    Yr, H = tf.nn.dynamicrnn(multicell, Xo, dtype=tf.float32, initial_state=Hin)
    H = tf.identity(H, name='H')
    # Softmax layer implementation
    Yflat = tf.reshape(Yr, [­1, INTERNALSIZE])
    Ylogits = layers.linear(Yflat, ALPHASIZE)
    Yflat = tf.reshape(Yo, [­1, ALPHASIZE])
    loss = tf.nn.softmax_cross_entropy_with_logits(logits=Ylogits, labels=Yflat)
    loss = tf.reshape(loss, [batchsize, ­1])
    Yo = tf.nn.softmax(Ylogits, name='Yo')
    Y = tf.argmax(Yo, 1)
    Y = tf.reshape(Y, [batchsize, ­1], name="Y")
    trainstep = tf.train.AdamOptimizer(lr).minimize(loss)
    # Init for saving models
    if not os.path.exists("checkpoints"):
    os.mkdir("checkpoints")
    saver = tf.train.Saver(max_to_keep=1000)
    # init
    istate = np.zeros([BATCHSIZE, INTERNALSIZE*NLAYERS])
    init = tf.global_variables_initializer()
    sess = tf.Session()
    sess.run(init)
    step = 0
    # train on one minibatch at a time
    for x, y, epoch in txt.rnnminibatch_sequencer(codetext, BATCHSIZE, SEQLEN, nb_ep
    feed_dict = {X: x, Ye: ye, Hin: istate, lr: learning_rate, pkeep: dropout_pkeep, batc
    , y, ostate = sess.run([trainstep, Y, H], feed_dict=feed_dict)
    if step // 10 % _50_BATCHES == 0:
    saved_file = saver.save(sess, 'checkpoints/rnn_train' + timestamp, global_st
    print("Saved file: " + saved_file)
    istate = ostate
    step += BATCHSIZE * SEQLEN

    View Slide

  6. GUTENBERG DATATSET
    Contains 34,000 English novels.
    https://www.gutenberg.org/

    View Slide

  7. ['h','e','l','l','o',' ',
    'm','y',' ','n','a','m','e',' ','i','s', ...]
    [10, 5, 12, 12, 17, 27, 15, 25, 27, 16, 1, 15, 5, 27, 6, 18, ...]
    merge all training documents into one
    load as a flat list of chars
    convert chars to integers
    flat sequence of integers that
    represent all text in dataset

    View Slide

  8. COME TO OUR WORKSHOP!

    View Slide

  9. 1. TRADITIONAL SUPERVISED
    LEARNING
    Use labelled historical data to predict future outcomes

    View Slide

  10. Given some input data, predict the correct output
    What features of the input tell us about the output?

    View Slide

  11. FEATURE SPACE
    A feature is some property that describes raw input data
    Features represented as a vector in feature space
    Abstract complexity of raw input for easier processing

    View Slide

  12. Training data is used to
    produce a model
    Model divides feature space
    into segments
    Each segment corresponds
    to one output class
    CLASSIFICATION
    f (x̄ ) = mx̄ + c

    View Slide

  13. Use trained model to predict outcome of new, unseen data.

    View Slide

  14. EXAMPLE
    x̄ = (area, perimeter)
    m = (1, −3.5)
    c = 0

    View Slide

  15. Using this model, let's predict what shape an object is.
    Feature Value
    Area 3
    Perimeter 1

    View Slide

  16. means le side of the line.
    Input shape is a triangle.
    x̄ = (3, 1)
    f (x̄ ) = 1(3) + −3.5(1) + 0
    f (x̄ ) = 3 − 3.5 + 0
    f (x̄ ) = −0.5
    −0.5 < 0
    −0.5 < 0

    View Slide

  17. THE HARD PART
    Learning and .
    m c

    View Slide

  18. Can this be used to learn how to write novels?

    View Slide

  19. No.

    View Slide

  20. GENERATING COHERENT TEXT REQUIRES MEMORY OF
    WHAT WAS WRITTEN PREVIOUSLY.
    Male Person Valentin, he
    Drinks drink, beer, lagers
    Valentin's favourite drink is beer. He likes
    lagers the most.

    View Slide

  21. How do we do this?

    View Slide

  22. 2. DEEP NEURAL NETWORKS

    View Slide

  23. Deep neural nets can learn to patterns in complex data, like
    language.
    We can encode memory into the algorithm.

    View Slide

  24. Just use the raw input data.
    Our training data is the raw text of existing novels.
    No need for for manual feature extraction.

    View Slide

  25. THE MIGHTY PERCEPTRON
    Equivalent to the straight line equation from before
    Linearly splits feature space
    Modeled a er a neuron in the human brain

    View Slide

  26. THE MIGHTY PERCEPTRON
    Synonymous to our linear function f(x) = mx + c
    For n features, the perceptron is defined as:
    Y = F(WX + B)
    n-dimensional weight vector w
    n-dimensional input vector x
    bias scalar b
    activation function f(s)
    output y

    View Slide

  27. THE MIGHTY PERCEPTRON
    F(WX + B) = Y

    View Slide

  28. ACTIVATION FUNCTION
    Simulates the 'firing' of a physical neuron.
    Takes the weighted sum and squashes it into a smaller
    range.

    View Slide

  29. ACTIVATION FUNCTION
    SIGMOID FUNCTION
    Squashes perceptron output into range [0,1]
    Used to learn weights (w)

    View Slide

  30. How do we learn w and b?

    View Slide

  31. PERCEPTRON LEARNING ALGORITHM
    Algorithm which learns correct weights and bias
    Use training dataset to incrementally train perceptron
    Guaranteed to create line that divides output classes
    (if data is linearly separable)

    View Slide

  32. View Slide

  33. View Slide

  34. View Slide

  35. View Slide

  36. REPRESENTING TEXT
    Make the input layer represent:
    a single word
    or a single character
    Use the input to word/char to predict the next.

    View Slide

  37. WE WILL USE CHARACTERS AS THE INPUTS.
    21
    22
    23
    24
    25
    current char next char
    21
    22
    23
    24
    25

    View Slide

  38. 21
    22
    23
    24
    25
    21
    22
    23
    24
    25
    Input: b Predicted char: ?
    Current sentence: b?

    View Slide

  39. 21
    22
    23
    24
    25
    21
    22
    23
    24
    25
    Input: b Predicted char: a
    Current sentence: ba

    View Slide

  40. 21
    22
    23
    24
    25
    21
    22
    23
    24
    25
    Input: a Predicted char: d
    Current sentence: bad

    View Slide

  41. ...

    View Slide

  42. 21
    22
    23
    24
    25
    21
    22
    23
    24
    25
    Input: e Predicted char: d
    Current sentence: ball games were o en played

    View Slide

  43. #WINNING

    View Slide

  44. PROBLEM
    Single perceptrons are straight line equations.
    Produce a single output, and hence cannot be used for
    complex problems like language.
    Need a network of neurons to output the full one-hot vector.

    View Slide

  45. SOLUTION: NEURAL NETWORKS
    Uses multi-layer perceptrons to:
    learn patterns in complex data, like language
    produce the multiple outputs required for text prediction
    Has multiple layers to provide flexibility on learning

    View Slide

  46. #WINNING

    View Slide

  47. NEURON CONNECTIVITY
    Each layer is fully connected to the next
    All nodes in layer are connected to nodes in layer
    Every single connection has a weight
    l l + 1

    View Slide

  48. Produces multiple weight matrices
    One for each layer
    Which allows us to...

    View Slide

  49. TRAINING NEURAL NETWORKS
    Learn the weight matrices!

    View Slide

  50. LOSS FUNCTION
    AN OPTIMIZATION PROBLEM
    Inputs:
    1. the real output of the network a er each batch
    2. the expected output (from our training data)
    Outputs:
    Number indicating performance of network.

    View Slide

  51. Lower loss values = better performance
    Better performance = better prediction accuracy

    View Slide

  52. GRADIENT DESCENT OPTIMISER
    We optimise the network by minimising its loss.
    Keep adjusting the weights of each hidden layer...
    ...until loss is not getting any smaller.

    View Slide

  53. BACKPROPAGATION
    Equivalent to gradient descent
    The training algorithm for neural networks
    For each feature vector in the training dataset, do a:
    1. forward pass
    2. backward pass

    View Slide

  54. FORWARD PASS
    expected output

    View Slide

  55. BACKWARD PASS
    expected output

    View Slide

  56. A er training the network, we obtain weights which
    minimise prediction error.
    Predict next character by running the last character through
    the forward pass step.

    View Slide

  57. #WINNING

    View Slide

  58. HOWEVER...
    Network still has no memory of past characters.
    Valentin's favourite drink is beer. He likes
    lagers the most.

    View Slide

  59. 3. DEEP RECURRENT NETWORKS

    View Slide

  60. SINGLE NEURON — ONE OUTPUT

    View Slide

  61. NEURAL NETWORK — MULTIPLE OUTPUTS

    View Slide

  62. DEEP NETWORKS — MANY HIDDEN LAYERS

    View Slide

  63. SIMPLIFIED VISUALISATION
    One node represents a full layer of neurons.

    View Slide

  64. View Slide

  65. RECURRENT NETWORKS
    O_ 0
    (layer 0 output)
    O_ 1
    (layer 1 output)
    O_ 2
    (layer 2 output)
    Hidden layer's input includes the output of itself during the
    last run of the network.

    View Slide

  66. UNROLLED RECURRENT NETWORK
    Previous predictions help make the next prediction.
    Each prediction is a time step.
    Time

    View Slide

  67. Time
    Time
    t=0
    t=1
    t=2
    t=3
    t=4
    t=5

    View Slide

  68. Time
    Time
    B
    t=0
    t=1
    t=2
    t=3
    t=4
    t=5

    View Slide

  69. Time
    Time
    B o
    t=0
    t=1
    t=2
    t=3
    t=4
    t=5

    View Slide

  70. Time
    Time
    B
    o
    o
    b
    t=0
    t=1
    t=2
    t=3
    t=4
    t=5

    View Slide

  71. Time
    Time
    B
    o
    b
    o
    b
    _
    t=0
    t=1
    t=2
    t=3
    t=4
    t=5

    View Slide

  72. Time
    Time
    B
    o
    b
    _
    o
    b
    i
    _
    t=0
    t=1
    t=2
    t=3
    t=4
    t=5

    View Slide

  73. Time
    Time
    B
    o
    b
    i
    _
    o
    b
    i
    s
    _
    t=0
    t=1
    t=2
    t=3
    t=4
    t=5

    View Slide

  74. Time
    Time
    B
    o
    b
    i
    s
    _
    o
    b
    i
    s
    _
    _
    t=0
    t=1
    t=2
    t=3
    t=4
    t=5

    View Slide

  75. Time
    Time
    B
    o
    b
    i
    s
    _
    o
    b
    i
    s
    _
    _
    t=0
    t=1
    t=2
    t=3
    t=4
    t=5

    View Slide

  76. PROBLEM: LONG-TERM DEPENDENCIES
    Time
    B
    o
    b
    i
    s
    _
    _
    a
    _
    a
    n
    m
    o
    b
    i
    s
    _
    _
    a
    _
    a
    n
    m
    ...

    View Slide

  77. CELL STATES
    Add extra state to each layer of the network.
    Remembers inputs far into the past.
    Transforms layer's original output into something that is
    relevant to the current context.

    View Slide

  78. O _0 O_ 1 O _2
    H_0
    (cell state)
    H_2
    (cell state)
    H_1
    (cell state)

    View Slide

  79. Time

    View Slide

  80. Hidden layer output and cell state is feed into next time step.
    Gives network ability to handle long-term dependencies in
    sequences.

    View Slide

  81. Time
    B
    o
    b
    i
    s
    _
    _
    a
    _
    a
    n
    m
    o
    b
    i
    s
    _
    _
    a
    _
    a
    n
    m
    ...

    View Slide

  82. 4. TRAINING RNNS

    View Slide

  83. These recurrent networks are trained in the same way as
    regular network.

    View Slide

  84. Backpropagation and gradient descent.

    View Slide

  85. expected output

    View Slide

  86. We need data to train the network.

    View Slide

  87. GUTENBERG DATATSET
    Contains 34,000 English novels.
    https://www.gutenberg.org/

    View Slide

  88. ['h','e','l','l','o',' ',
    'm','y',' ','n','a','m','e',' ','i','s', ...]
    [10, 5, 12, 12, 17, 27, 15, 25, 27, 16, 1, 15, 5, 27, 6, 18, ...]
    merge all training documents into one
    load as a flat list of chars
    convert chars to integers
    use integers to generate one-hot inputs for each time step

    View Slide

  89. COMMON TRAINING METHODS
    Run backpropagation a er:
    Stochastic one sequence
    Batch all sequences
    Mini-Batch smaller batch of sequences
    b

    View Slide

  90. We'll use mini-batch.

    View Slide

  91. WHY?
    Stochastic long time to converge on good weights
    Batch consumes lots of memory, gets stuck on
    "okay" weights
    Mini-
    Batch
    quick to converge and memory efficient

    View Slide

  92. Iterate across all batches.
    Run backpropagation a er processing each batch.
    [10, 5, 12, 12, 17, 27, 15, 25, 27, 16, 1, 15, 5, 27, 6, 18, ...]
    [27, 16, 1, 15]
    [10, 5, 12, 12] [5, 27, 6, 18]
    [17, 27, 15, 25] ...

    View Slide

  93. 5. NEURAL NETS IN PYTHON

    View Slide

  94. Building a neural network involves:
    1. defining its architecture
    2. learning the weight matrices for that architecture

    View Slide

  95. PROBLEM: COMPLEX GRAPHS

    View Slide

  96. PROBLEM: COMPLEX DERIVATIONS

    View Slide

  97. SOLUTION
    Can build very complex networks quickly
    Easy to extend if required
    Built-in support for RNN memory cells

    View Slide

  98. OTHER PYTHON NEURAL NET LIBRARIES

    View Slide

  99. 6. TENSORFLOW
    BUILDING OUR MODEL

    View Slide

  100. current char predicted next char
    Input
    Output
    Hidden Recurrent Layers
    BUILD A RECURRENT NEURAL NETWORK TO GENERATE
    STORIES IN TENSORFLOW.

    View Slide

  101. HOW?
    Build a computation graph that learns the weights of our
    network.

    View Slide

  102. THE COMPUTATION GRAPH
    tf.Tensor Unit of data. Vectors or matrices of
    values (floats, ints, etc.).
    tf.Operation Unit of computation. Takes 0+
    tf.Tensors as inputs and outputs
    0+ tf.Tensors.
    tf.Graph Collection of connected tf.Tensors
    and tf.Operations.
    Operations are nodes and tensors are edges.

    View Slide

  103. THE COMPUTATION GRAPH

    View Slide

  104. GRAPH THAT TRIPLES NUMBERS AND SUMS THEM.
    Output
    # 1. Define Inputs
    # Input is a 2D vector containing the two numbers to triple.
    inputs = tf.placeholder(tf.float32, [2])
    # 2. Define Internal Operations
    tripled_numbers = tf.scalar_mul(3, inputs)
    # 3. Define Final Output
    # Sum the previously tripled inputs.
    output_sum = tf.reduce_sum(tripled_numbers)
    # 4. Run the graph with some inputs to produce the output.
    session = tf.Session()
    result = session.run(output_sum, feed_dict={inputs: [300, 10]})
    print(result)
    930

    View Slide

  105. DEFINING HYPERPARAMETERS
    current char predicted next char
    Input
    Output
    Hidden Recurrent Layers
    # Input Hyperparameters
    SEQUENCE_LEN = 30
    BATCH_SIZE = 200
    ALPHABET_SIZE = 98
    # Hidden Recurrent Layer Hyperparameters
    HIDDEN_LAYER_SIZE = 512
    NUM_HIDDEN_LAYERS = 3

    View Slide

  106. current char predicted next char
    Input
    # Dimensions: [ BATCH_SIZE, SEQUENCE_LEN ]
    X = tf.placeholder(tf.uint8, [None, None], name='X')

    View Slide

  107. current char predicted next char
    Input
    # Dimensions: [ BATCH_SIZE, SEQUENCE_LEN, ALPHABET_SIZE ]
    Xo = tf.one_hot(X, ALPHABET_SIZE, 1.0, 0.0)

    View Slide

  108. DEFINING HIDDEN STATE
    current char predicted next char
    Input
    Hidden Recurrent Layers

    View Slide

  109. DEFINING HIDDEN STATE
    from tensorflow.contrib import rnn
    # Cell State
    # [ BATCH_SIZE, HIDDEN_LAYER_SIZE * NUM_HIDDEN_LAYERS]
    H_in = tf.placeholder(
    tf.float32,
    [None, HIDDEN_LAYER_SIZE * NUM_HIDDEN_LAYERS],
    name='H_in')
    # Create desired number of hidden layers that use the `GRUCell`
    # for managing hidden state.
    cells = [
    rnn.GRUCell(HIDDEN_LAYER_SIZE)
    for _ in range(NUM_HIDDEN_LAYERS)
    ]
    multicell = rnn.MultiRNNCell(cells)

    View Slide

  110. UNROLLING RECURRENT NETWORK LAYERS
    Time

    View Slide

  111. UNROLLING RECURRENT NETWORK LAYERS
    Wrap recurrent hidden layers in tf.dynamic_rnn.
    Unrolls loops when computation graph is running.
    Loops will be unrolled SEQUENCE_LENGTH times.
    Yr, H_out = tf.nn.dynamic_rnn(
    multicell,
    Xo,
    dtype=tf.float32,
    initial_state=H_in)
    # Yr = output of network. probability distribution of
    # next character.
    # H_out = the altered hidden cell state after processing
    # last input.

    View Slide

  112. OUTPUT IS PROBABILITY DISTRIBUTION
    current char predicted next char
    Input
    Output
    Hidden Recurrent Layers
    from tensorflow.contrib import layers
    # [ BATCH_SIZE x SEQUENCE_LEN, HIDDEN_LAYER_SIZE ]
    Yflat = tf.reshape(Yr, [­1, HIDDEN_LAYER_SIZE])
    # [ BATCH_SIZE x SEQUENCE_LEN, ALPHABET_SIZE ]
    Ylogits = layers.linear(Yflat, ALPHABET_SIZE)
    # [ BATCH_SIZE x SEQUENCE_LEN, ALPHABET_SIZE ]
    Yo = tf.nn.softmax(Ylogits, name='Yo')

    View Slide

  113. PICK MOST PROBABLE CHARACTER
    current char predicted next char
    Input
    Output
    Hidden Recurrent Layers
    # [ BATCH_SIZE * SEQUENCE_LEN ]
    Y = tf.argmax(Yo, 1)
    # [ BATCH_SIZE, SEQUENCE_LEN ]
    Y = tf.reshape(Y, [BATCH_SIZE, ­1], name="Y")

    View Slide

  114. Remaining tasks:
    define our loss function
    decide what weight optimiser to use

    View Slide

  115. LOSS FUNCTION
    Needs:
    1. the real output of the network a er each batch
    2. the expected output (from our training data)

    View Slide

  116. LOSS FUNCTION
    current char predicted next char
    loss
    expected next char
    Input
    Output
    Accuracy/Loss Calculation
    Hidden Recurrent Layers

    View Slide

  117. LOSS FUNCTION
    Input expected next chars into network:
    # [ BATCH_SIZE, SEQUENCE_LEN ]
    Y_ = tf.placeholder(tf.uint8, [None, None], name='Y_')
    # [ BATCH_SIZE, SEQUENCE_LEN, ALPHABET_SIZE ]
    Yo_ = tf.one_hot(Y_, ALPHABET_SIZE, 1.0, 0.0)
    # [ BATCH_SIZE x SEQUENCE_LEN, ALPHABET_SIZE ]
    Yflat_ = tf.reshape(Yo_, [­1, ALPHABET_SIZE])

    View Slide

  118. LOSS FUNCTION
    Defining the loss function:
    # [ BATCH_SIZE * SEQUENCE_LEN ]
    loss = tf.nn.softmax_cross_entropy_with_logits(
    logits=Ylogits,
    labels=Yflat_)
    # [ BATCH_SIZE, SEQUENCE_LEN ]
    loss = tf.reshape(loss, [BATCH_SIZE, ­1])

    View Slide

  119. CHOOSE AN OPTIMISER
    Will adjust network weights to minimise the loss.
    In the workshop we'll use a flavour called
    AdamOptimizer.
    train_step = tf.train.GradientDescentOptimizer(lr).minimize(loss)

    View Slide

  120. 7. TENSORFLOW
    TRAINING THE MODEL

    View Slide

  121. EPOCHS
    We run mini-batch training on the network.
    Train network on all batches multiple times.
    Each run across all batches is an epoch.
    More epochs = better weights = better accuracy.

    View Slide

  122. MINIBATCH SPLITTING ACROSS EPOCHS
    # Contains: [Training Data, Test Data, Epoch Number]
    Batch = Tuple[np.matrix, np.matrix, int]
    def rnn_minibatch_generator(
    data: List[int],
    batch_size: int,
    sequence_length: int,
    num_epochs: int) ­> Generator[Batch, None, None]:
    for epoch in range(num_epochs):
    for batch in range(num_batches):
    # split data into batches, where each batch contains `b`
    # of length `sequence_length`.
    training_data = ...
    test_data = ...
    yield training_data, c, epoch

    View Slide

  123. START TRAINING
    Load dataset and construct mini-batch generator:
    # Initialize the hidden cell states to 0 before running any steps.
    input_state = np.zeros(
    [BATCH_SIZE, HIDDEN_LAYER_SIZE * NUM_HIDDEN_LAYERS])
    # Create the session and initialize its variables to 0.
    init = tf.global_variables_initializer()
    session = tf.Session()
    session.run(init)
    char_integer_list = []
    generator = rnn_minibatch_generator(
    char_integer_list,
    BATCH_SIZE,
    SEQUENCE_LENGTH,
    num_epochs=10)

    View Slide

  124. Run training step on all mini-batches for multiple epochs:
    # Initialise input state
    step = 0
    input_state = np.zeros([
    BATCH_SIZE, HIDDEN_LAYER_SIZE * NUM_HIDDEN_LAYERS
    ])
    # Run training step loop
    for batch_input, expected_batch_output, epoch in generator:
    graph_inputs = {
    X: batch_input, Y_: expected_batch_output,
    Hin: input_state, batch_size: BATCH_SIZE
    }
    _, output, output_state = session.run(
    [train_step, Y, H],
    feed_dict=graph_inputs)
    # Loop state around for next recurrent run
    input_state = output_state
    step += BATCH_SIZE * SEQUENCE_LENGTH

    View Slide

  125. FINAL RESULTS

    View Slide

  126. Epoch 0.0
    Dy8v:SH3U 2d4 xZ Vaf%hO kS0i6 7y U5SUu6nSsR0 x
    MYiZ5ykLOtG3Q,cu St k V ctc_N CQFSbF%]q3ZsWWK8wP
    gyfYt3DpFo yhZ_ss,"IedX%lj,R%_4ux IX5 R%N3wQNG PnSl
    1DJqLdpc[kLeSYMoE]kf xCe29 J[r_k 6BiUs GUguW Y [Kw8"P
    Sg" e[2OCL%G mad6,:J[A k 5 jz 46iyQLuuT 9qTn
    GjT6:dSjv6RXMyjxX8:3 h cr sYBgnc8 DP04A8laW

    View Slide

  127. Epoch 0.1
    Uum awetuarteeuF toBdU iwObaaMlr o rM OufNJetu iida
    cZeDbRuZfU m igdaao QH NBJ diace e L cjoXeu ZDjiM AeN
    g iu O Aoc jdjrmIuaai ie t qmuozPwaEkoihca eXuzRCgZ iW
    AeqapiwaT VInBosPkqroi s yWbJoj yKq oUo
    jebaYigEouzxVb eyt Px hiamIf vPOiiPu ky Cut LviPoej iE w
    hpFVxes h zwsvoidmoWxzgTnL ujDt Pr a

    View Slide

  128. Epoch 1
    Here is the goal of my further. I shouldn't be the shash of
    no. Sky is bright and blue as running goeg on.
    Paur decided to move downwards to the floor, where the
    treasure was stored. She then thought to call her friend
    from ahead.

    View Slide

  129. ...

    View Slide

  130. Epoch 50
    Gradually drawing away from the rest, two combatants are
    striving; each devoting every nerve, every energy, to the
    overthrow of the other.
    But each attack is met by counter attack, each terrible
    swinging stroke by the crash of equally hard pain or the
    dull slap of tough hard shield opposed in parry.
    More men are down. Even numbers of men on each side,
    these two combatants strive on.

    View Slide

  131. FURTHER EXAMPLES
    Andrej Karpathy's blog post:
    The Unreasonable Effectiveness of Recurrent Neural
    Networks

    View Slide

  132. /*
    * Increment the size file of the new incorrect UI_FILTER group information
    * of the size generatively.
    */
    static int indicate_policy(void)
    {
    int error;
    if (fd == MARN_EPT) {
    /* The kernel blank will coeld it to userspace. */
    if (ss­>segment < mem_total)
    unblock_graph_and_set_blocked();
    else
    ret = 1;
    goto bail;
    }
    segaddr = in_SB(in.addr);
    selector = seg / 16;
    setup_works = true;
    for (i = 0; i < blocks; i++) {
    seq = buf[i++];
    bpf = bd­>bd.next + i * search;
    if (fd) {
    current = blocked;
    }
    }
    rw­>name = "Getjbbregs";
    bprm_self_clearl(&iv­>version);
    regs­>new = blocks[(BPF_STATS << info­>historidac)] | PFMR_CLOBATHINC_SECONDS << 12;
    return segtable;
    }

    View Slide

  133. View Slide

  134. SUCCESS!

    View Slide

  135. We have created an AI author!
    Less than 100 lines of Tensorflow code!
    # ONE
    import tensorflow as tf
    from tensorflow.contrib import layers, rnn
    import os
    import time
    import math
    import numpy as np
    tf.setrandom_seed(0)
    # model parameters
    SEQLEN = 30
    BATCHSIZE = 200
    ALPHASIZE = 89
    INTERNALSIZE = 512
    NLAYERS = 3
    learning_rate = 0.001
    dropout_pkeep = 0.8
    codetext, valitext, bookranges = load_data()
    # the model
    lr = tf.placeholder(tf.float32, name='lr') # learning rate
    pkeep = tf.placeholder(tf.float32, name='pkeep') # dropout parameter
    batchsize = tf.placeholder(tf.int32, name='batchsize')
    # inputs
    X = tf.placeholder(tf.uint8, [None, None], name='X')
    Xo = tf.one_hot(X, ALPHASIZE, 1.0, 0.0)
    # expected outputs
    Y = tf.placeholder(tf.uint8, [None, None], name='Y')
    Yo = tf.onehot(Y, ALPHASIZE, 1.0, 0.0)
    # input state
    Hin = tf.placeholder(tf.float32, [None, INTERNALSIZE*NLAYERS], name='Hin')
    # hidden layers
    cells = [rnn.GRUCell(INTERNALSIZE) for _ in range(NLAYERS)]
    multicell = rnn.MultiRNNCell(cells, state_is_tuple=False)
    # TWO
    Yr, H = tf.nn.dynamicrnn(multicell, Xo, dtype=tf.float32, initial_state=Hin)
    H = tf.identity(H, name='H')
    # Softmax layer implementation
    Yflat = tf.reshape(Yr, [­1, INTERNALSIZE])
    Ylogits = layers.linear(Yflat, ALPHASIZE)
    Yflat = tf.reshape(Yo, [­1, ALPHASIZE])
    loss = tf.nn.softmax_cross_entropy_with_logits(logits=Ylogits, labels=Yflat)
    loss = tf.reshape(loss, [batchsize, ­1])
    Yo = tf.nn.softmax(Ylogits, name='Yo')
    Y = tf.argmax(Yo, 1)
    Y = tf.reshape(Y, [batchsize, ­1], name="Y")
    trainstep = tf.train.AdamOptimizer(lr).minimize(loss)
    # Init for saving models
    if not os.path.exists("checkpoints"):
    os.mkdir("checkpoints")
    saver = tf.train.Saver(max_to_keep=1000)
    # init
    istate = np.zeros([BATCHSIZE, INTERNALSIZE*NLAYERS])
    init = tf.global_variables_initializer()
    sess = tf.Session()
    sess.run(init)
    step = 0
    # train on one minibatch at a time
    for x, y, epoch in txt.rnnminibatch_sequencer(codetext, BATCHSIZE, SEQLEN, nb_ep
    feed_dict = {X: x, Ye: ye, Hin: istate, lr: learning_rate, pkeep: dropout_pkeep, batc
    , y, ostate = sess.run([trainstep, Y, H], feed_dict=feed_dict)
    if step // 10 % _50_BATCHES == 0:
    saved_file = saver.save(sess, 'checkpoints/rnn_train' + timestamp, global_st
    print("Saved file: " + saved_file)
    istate = ostate
    step += BATCHSIZE * SEQLEN

    View Slide

  136. CODE
    SLIDES
    https://github.com/DonaldWhyte/deep-learning-with-rnns
    http://donaldwhyte.co.uk/deep-learning-with-rnns

    View Slide

  137. COME TO OUR WORKSHOP!

    View Slide

  138. GET IN TOUCH
    [email protected] .io
    @donald_whyte
    https://github.com/DonaldWhyte
    [email protected]
    @AxSaucedo
    https://github.com/axsauze

    View Slide

  139. SOURCES
    Martin Görner -- Tensorflow RNN
    Shakespeare
    Understanding LSTMs
    Composing Music with Recurrent Neural
    Networks

    View Slide