Michelle Fullwood - A gentle introduction to deep learning with TensorFlow

Michelle Fullwood - A gentle introduction to deep learning with TensorFlow

Deep learning's explosion of spectacular results over the past few years may make it appear esoteric and daunting, but in reality, if you are familiar with traditional machine learning, you're more than ready to start exploring deep learning. This talk aims to gently bridge the divide by demonstrating how deep learning operates on core machine learning concepts and getting attendees started coding deep neural networks using Google's TensorFlow library.

https://us.pycon.org/2017/schedule/presentation/2/

Bde70c0ba031a765ff25c19e6b7d6d23?s=128

PyCon 2017

May 21, 2017
Tweet

Transcript

  1. A GENTLE INTRODUCTION TO DEEP LEARNING WITH TENSORFLOW Michelle Fullwood

    @michelleful Slides: michelleful.github.io/PyCon2017
  2. PREREQUISITES Knowledge of concepts of supervised ML Familiarity with linear

    and logistic regression
  3. TARGET (Deep) Feed-forward neural networks How they're constructed Why they

    work How to train and optimize them Image source: Fjodor van Veen (2016) Neural Network Zoo
  4. DEEP LEARNING LEARNING CURVE

  5. DEEP LEARNING LEARNING CURVE

  6. DEEP LEARNING LEARNING CURVE

  7. DEEP LEARNING LEARNING CURVE

  8. DEEP LEARNING LEARNING CURVE

  9. Traditional machine learning Deep learning

  10. TENSORFLOW Popular deep learning toolkit From Google Brain, Apache-licensed Python

    API, makes calls to C++ back- end Works on CPUs and GPUs
  11. LINEAR REGRESSION FROM SCRATCH

  12. LINEAR REGRESSION

  13. INPUTS

  14. INPUTS X_train = np.array([ [1250, 350, 3], [1700, 900, 6],

    [1400, 600, 3] ]) Y_train = np.array([345000, 580000, 360000])
  15. MODEL Multiply each feature by a weight and add them

    up. Add an intercept to get our nal estimate.
  16. MODEL

  17. MODEL - PARAMETERS weights = np.array([300, -10, -1]) intercept =

    -26497
  18. MODEL - OPERATIONS

  19. MODEL - OPERATIONS def model(X, weights, intercept): return X.dot(weights) +

    intercept Y_hat = model(X_train, weights, intercept)
  20. MODEL - COST FUNCTION

  21. MODEL - COST FUNCTION

  22. MODEL - COST FUNCTION

  23. COST FUNCTION def cost(Y_hat, Y): return np.sum((Y_hat - Y)**2)

  24. OPTIMIZATION Hold X and Y constant. Adjust parameters to minimize

    cost.
  25. OPTIMIZATION

  26. TRIAL AND ERROR Image source: Wikimedia Commons

  27. OPTIMIZATION

  28. OPTIMIZATION

  29. OPTIMIZATION - GRADIENT CALCULATION Goal: = + + + b

    y ^ w 0 x 0 w 1 x 1 w 2 x 2 ϵ = (y − y ^) 2 , ∂ϵ ∂w i ∂ϵ ∂b
  30. OPTIMIZATION - GRADIENT CALCULATION Chain rule: = ∂ϵ ∂w i

    dϵ dy ^ ∂y ^ ∂w i
  31. OPTIMIZATION - GRADIENT CALCULATION = + + + b y

    ^ w 0 x 0 w 1 x 1 w 2 x 2 = ∂y ^ ∂w 0 x 0
  32. OPTIMIZATION - GRADIENT CALCULATION ϵ = (y − y ^)

    2 = dϵ dy ^ −2(y − ) y ^
  33. OPTIMIZATION - GRADIENT CALCULATION = ∂y ^ ∂w 0 x

    0 = −2(y − ) dϵ dy ^ y ^ = −2(y − ) ∂ϵ ∂w 0 y ^ x 0
  34. OPTIMIZATION - GRADIENT CALCULATION = + + + b ⋅

    1 y ^ w 0 x 0 w 1 x 1 w 2 x 2 = −2(y − ) ⋅ 1 ∂ϵ ∂b y ^
  35. OPTIMIZATION - GRADIENT CALCULATION delta_y = y - y_hat gradient_weights

    = -2 * delta_y * weights gradient_intercept = -2 * delta_y * 1
  36. OPTIMIZATION - PARAMETER UPDATE weights = weights - gradient_weights intercept

    = intercept - gradient_intercept
  37. OPTIMIZATION - OVERSHOOT

  38. OPTIMIZATION - UNDERSHOOT

  39. OPTIMIZATION - PARAMETER UPDATE learning_rate = 0.05 weights = weights

    - \ learning_rate * gradient_weights intercept = intercept - \ learning_rate * gradient_intercept
  40. TRAINING def training_round(x, y, weights, intercept, alpha=learning_rate): # calculate our

    estimate y_hat = model(x, weights, intercept) # calculate error delta_y = y - y_hat # calculate gradients gradient_weights = -2 * delta_y * weights gradient_intercept = -2 * delta_y # update parameters weights = weights - alpha * gradient_weights intercept = intercept - alpha * gradient_intercept return weights, intercept
  41. TRAINING NUM_EPOCHS = 100 def train(X, Y): # initialize parameters

    weights = np.random.randn(3) intercept = 0 # training rounds for i in range(NUM_EPOCHS): for (x, y) in zip(X, Y): weights, intercept = training_round(x, y, weights, intercept)
  42. TESTING def test(X_test, Y_test, weights, intercept): Y_predicted = model(X_test, weights,

    intercept) error = cost(Y_predicted, Y_test) return np.sqrt(np.mean(error)) >>> test(X_test, Y_test, final_weights, final_intercept) 6052.79
  43. Uh, wasn't this supposed to be a talk about neural

    networks? Why are we talking about linear regression?
  44. SURPRISE! YOU'VE ALREADY MADE A NEURAL NETWORK!

  45. LINEAR REGRESSION = SIMPLEST NEURAL NETWORK

  46. ONCE MORE, WITH TENSORFLOW

  47. Inputs Model - Parameters Model - Operations Cost function Optimization

    Train Test
  48. INPUTS → PLACEHOLDERS import tensorflow as tf X = tf.placeholder(tf.float32,

    [None, 3]) Y = tf.placeholder(tf.float32, [None, 1])
  49. PARAMETERS → VARIABLES # create tf.Variable(s) W = tf.get_variable("weights", [3,

    1], initializer=tf.random_normal_initializer()) b = tf.get_variable("intercept", [1], initializer=tf.constant_initializer(0))
  50. OPERATIONS Y_hat = tf.matmul(X, W) + b

  51. COST FUNCTION cost = tf.reduce_mean(tf.square(Y_hat - Y))

  52. OPTIMIZATION learning_rate = 0.05 optimizer = tf.train.GradientDescentOptimizer (learning_rate).minimize(cost)

  53. TRAINING with tf.Session() as sess: # initialize variables sess.run(tf.global_variables_initializer()) #

    train for _ in range(NUM_EPOCHS): for (X_batch, Y_batch) in get_minibatches( X_train, Y_train, BATCH_SIZE): sess.run(optimizer, feed_dict={ X: X_batch, Y: Y_batch })
  54. TRAINING with tf.Session() as sess: # initialize variables sess.run(tf.global_variables_initializer()) #

    train for _ in range(NUM_EPOCHS): for (X_batch, Y_batch) in get_minibatches( X_train, Y_train, BATCH_SIZE): sess.run(optimizer, feed_dict={ X: X_batch, Y: Y_batch })
  55. TRAINING with tf.Session() as sess: # initialize variables sess.run(tf.global_variables_initializer()) #

    train for _ in range(NUM_EPOCHS): for (X_batch, Y_batch) in get_minibatches( X_train, Y_train, BATCH_SIZE): sess.run(optimizer, feed_dict={ X: X_batch, Y: Y_batch })
  56. TRAINING with tf.Session() as sess: # initialize variables sess.run(tf.global_variables_initializer()) #

    train for _ in range(NUM_EPOCHS): for (X_batch, Y_batch) in get_minibatches( X_train, Y_train, BATCH_SIZE): sess.run(optimizer, feed_dict={ X: X_batch, Y: Y_batch })
  57. # Placeholders X = tf.placeholder(tf.float32, [None, 3]) Y = tf.placeholder(tf.float32,

    [None, 1]) # Parameters/Variables W = tf.get_variable("weights", [3, 1], initializer=tf.random_normal_initializer()) b = tf.get_variable("intercept", [1], initializer=tf.constant_initializer(0)) # Operations Y_hat = tf.matmul(X, W) + b # Cost function cost = tf.reduce_mean(tf.square(Y_hat - Y)) # Optimization optimizer = tf.train.GradientDescentOptimizer (learning_rate).minimize(cost) # ------------------------------------------------ # Train with tf.Session() as sess: # initialize variables sess.run(tf.global_variables_initializer()) # run training rounds for _ in range(NUM_EPOCHS): for X_batch, Y_batch in get_minibatches( X_train, Y_train, BATCH_SIZE): sess.run(optimizer, feed_dict={X: X_batch, Y: Y_batch})
  58. None
  59. None
  60. COMPUTATION GRAPH

  61. COMPUTATION GRAPH

  62. FORWARD PROPAGATION

  63. FORWARD PROPAGATION

  64. FORWARD PROPAGATION

  65. FORWARD PROPAGATION

  66. FORWARD PROPAGATION

  67. FORWARD PROPAGATION def training_round(x, y, weights, intercept, alpha=learning_rate): # calculate

    our estimate y_hat = model(x, weights, intercept) # calculate error delta_y = y - y_hat # calculate gradients gradient_weights = -2 * delta_y * weights gradient_intercept = -2 * delta_y # update parameters weights = weights - alpha * gradient_weights intercept = intercept - alpha * gradient_intercept return weights, intercept
  68. BACKPROPAGATION

  69. BACKPROPAGATION

  70. BACKPROPAGATION

  71. BACKPROPAGATION

  72. BACKPROPAGATION

  73. BACKPROPAGATION

  74. BACKPROPAGATION

  75. BACKPROPAGATION

  76. BACKPROPAGATION

  77. BACKPROPAGATION

  78. BACKPROPAGATION

  79. BACKPROPAGATION def training_round(x, y, weights, intercept, alpha=learning_rate): # calculate our

    estimate y_hat = model(x, weights, intercept) # calculate error delta_y = y - y_hat # calculate gradients gradient_weights = -2 * delta_y * weights gradient_intercept = -2 * delta_y # update parameters weights = weights - alpha * gradient_weights intercept = intercept - alpha * gradient_intercept return weights, intercept
  80. VARIABLE UPDATE

  81. VARIABLE UPDATE

  82. VARIABLE UPDATE

  83. VARIABLE UPDATE def training_round(x, y, weights, intercept, alpha=learning_rate): # calculate

    our estimate y_hat = model(x, weights, intercept) # calculate error delta_y = y - y_hat # calculate gradients gradient_weights = -2 * delta_y * weights gradient_intercept = -2 * delta_y # update parameters weights = weights - alpha * gradient_weights intercept = intercept - alpha * gradient_intercept return weights, intercept
  84. NUMPY → TENSORFLOW sess.run(optimizer, feed_dict={ X: X_batch, Y: Y_batch })

  85. TESTING with tf.Session() as sess: # train # ... (code

    from above) # test Y_predicted = sess.run(model, feed_dict = {X: X_test}) squared_error = tf.reduce_mean( tf.square(Y_test, Y_predicted)) >>> np.sqrt(squared_error) 5967.39
  86. LOGISTIC REGRESSION

  87. PROBLEM

  88. BINARY CLASSIFICATION

  89. BINARY LOGISTIC REGRESSION - MODEL Take a weighted sum of

    the features and add a bias term to get the logit. Convert the logit to a probability via the logistic-sigmoid function.
  90. BINARY LOGISTIC REGRESSION - MODEL

  91. LOGISTIC-SIGMOID FUNCTION f(x) = e x 1+ex

  92. CLASSIFICATION WITH LOGISTIC REGRESSION Image generated with playground.tensor ow.org

  93. MODEL

  94. SOFTMAX Z = np.sum(np.exp(logits))

  95. MODEL

  96. PLACEHOLDERS # X = vector length 784 (= 28 x

    28 pixels) # Y = one-hot vectors # digit 0 = [1, 0, 0, 0, 0, 0, 0, 0, 0, 0] X = tf.placeholder(tf.float32, [None, 28*28]) Y = tf.placeholder(tf.float32, [None, 10])
  97. VARIABLES # Parameters/Variables W = tf.get_variable("weights", [784, 10], initializer=tf.random_normal_initializer()) b

    = tf.get_variable("bias", [10], initializer=tf.constant_initializer(0))
  98. OPERATIONS Y_logits = tf.matmul(X, W) + b

  99. COST FUNCTION cost = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits( logits=Y_logits, labels=Y))

  100. COST FUNCTION Cross Entropy H( ) = − log( )

    y ^ ∑ i y i y ^ i
  101. OPTIMIZATION learning_rate = 0.05 optimizer = tf.train.GradientDescentOptimizer (learning_rate).minimize(cost)

  102. TRAINING with tf.Session() as sess: sess.run(tf.global_variables_initializer()) for i in range(NUM_EPOCHS):

    for (X_batch, Y_batch) in get_minibatches( X_train, Y_train, BATCH_SIZE): sess.run(optimizer, feed_dict={X: X_batch, Y: Y_batch})
  103. TESTING predict = tf.argmax(Y_logits, 1) with tf.Session() as sess: #

    training code from above predictions = sess.run(predict, feed_dict={X: X_test}) accuracy = tf.reduce_mean(np.mean( np.argmax(Y_test, axis=1) == predictions) >>> accuracy 0.925
  104. DEFICIENCIES OF LINEAR MODELS Image generated with playground.tensor ow.org

  105. DEFICIENCIES OF LINEAR MODELS Image generated with playground.tensor ow.org

  106. LET'S GO DEEPER!

  107. ADDING ANOTHER LAYER

  108. ADDING ANOTHER LAYER - VARIABLES HIDDEN_NODES = 128 W1 =

    tf.get_variable("weights1", [784, HIDDEN_NODES], initializer=tf.random_normal_initializer()) b1 = tf.get_variable("bias1", [HIDDEN_NODES], initializer=tf.constant_initializer(0)) W2 = tf.get_variable("weights2", [HIDDEN_NODES, 10], initializer=tf.random_normal_initializer()) b2 = tf.get_variable("bias2", [10], initializer=tf.constant_initializer(0))
  109. ADDING ANOTHER LAYER - OPERATIONS hidden = tf.matmul(X, W1) +

    b1 y_logits = tf.matmul(hidden, W2) + b2
  110. RESULTS # hidden layers Train accuracy Test accuracy 0 93.0

    92.5 1 89.2 88.8
  111. IS DEEP LEARNING JUST HYPE? (Well, it's a little bit

    over-hyped...)
  112. PROBLEM A linear transformation of a linear transformation is still

    a linear transformation! We need to add non-linearity to the system.
  113. ADDING NON-LINEARITY

  114. ADDING NON-LINEARITY

  115. NON-LINEAR ACTIVATION FUNCTIONS

  116. ADDING NON-LINEARITY

  117. OPERATIONS hidden = tf.nn.relu(tf.matmul(X, W1) + b1) y_logits = tf.matmul(hidden,

    W2) + b2
  118. RESULTS # hidden layers Train accuracy Test accuracy 0 93.0

    92.5 1 97.9 95.2
  119. WHAT THE HIDDEN LAYER BOUGHT US Image generated with playground.tensor

    ow.org
  120. WHAT THE HIDDEN LAYER BOUGHT US Image generated with playground.tensor

    ow.org
  121. ADDING HIDDEN NEURONS 2 hidden neurons Image generated with ConvNetJS

    by Andrej Karpathy
  122. ADDING HIDDEN NEURONS 3 hidden neurons Image generated with ConvNetJS

    by Andrej Karpathy
  123. ADDING HIDDEN NEURONS 4 hidden neurons Image generated with ConvNetJS

    by Andrej Karpathy
  124. ADDING HIDDEN NEURONS 5 hidden neurons Image generated with ConvNetJS

    by Andrej Karpathy
  125. ADDING HIDDEN NEURONS Image generated with ConvNetJS by Andrej Karpathy

  126. ADDING HIDDEN NEURONS Image generated with ConvNetJS by Andrej Karpathy

  127. UNIVERSAL APPROXIMATION THEOREM A feedforward network with a single hidden

    layer containing a nite number of neurons can approximate (basically) any interesting function
  128. ARE WE DEEP LEARNING YET? No!

  129. OPERATIONS hidden_1 = tf.nn.relu(tf.matmul(X, W1) + b1) hidden_2 = tf.nn.relu(tf.matmul(hidden_1,

    W2) + b2) y_logits = tf.matmul(hidden_2, W3) + b3
  130. WHY GO DEEP? 3 reasons: Deeper networks are more powerful

  131. MORE POWERFUL

  132. WHY GO DEEP? 3 reasons: Deeper networks are more powerful

    Narrower networks are less prone to over tting
  133. OVERFITTING

  134. LESS PRONE TO OVERFITTING

  135. None
  136. None
  137. None
  138. None
  139. None
  140. None
  141. None
  142. None
  143. None
  144. None
  145. None
  146. None
  147. None
  148. None
  149. None
  150. None
  151. None
  152. None
  153. None
  154. None
  155. None
  156. None
  157. None
  158. None
  159. None
  160. None
  161. None
  162. None
  163. None
  164. None
  165. None
  166. None
  167. None
  168. None
  169. None