Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A gentle introduction to TensorFlow

A gentle introduction to TensorFlow

A description of basic concepts behind TensorFlow. Some simple examples along with their implementation in Python.

Kacper Łukawski

December 06, 2016
Tweet

More Decks by Kacper Łukawski

Other Decks in Programming

Transcript

  1. Running modes There are three possibilities how created computations can

    be executed: • CPU • GPU (currently only CUDA, but there is a plan to support OpenCL in the TensorFlow’s roadmap) • Distributed mode
  2. Computation graphs TensorFlow uses graphs to represent math computations. Each

    node can be thought as a single math operation. On the right side there is a simple example of graph representation for the following equation: y = sin(exp(2x + 7)) y sin exp + X 2 x 7
  3. Computation graphs - details • Each node may have zero

    or more inputs and outputs represents single operation. • Operation is an abstract computation, while a concrete implementation is called kernel. • TensorFlow simulates graph execution and tries to find the best device that may execute a particular node. Estimated completion time and communication cost are also taken into consideration. • To avoid redundant copies of the same values, common subexpression elimination is done. • Values that flow along graph edges are called tensors.
  4. Tensors TensorFlow delivers three basic types of tensors: • Constants

    - for immutable values that do not change over time • Variables - mutable, mainly for storing some necessary values. • Placeholders - input parameters which computation graph needs to be fed with, each time when it is executed. import tensorflow as tf # A constant value PI = tf.constant(3.14, name='PI') # A matrix variable A = tf.Variable(tf.zeros((3, 3)), name='A') # An input placeholder x = tf.placeholder(dtype=tf.int32, name='x')
  5. Application workflow Typical application using TensorFlow is divided into two

    parts: 1. Designing a computation graph. 2. Running the computations with the concrete numeric values. In terms of TensorFlow it is called a session. import tensorflow as tf # Graph preparation x = tf.placeholder(tf.float32, name='x') y = tf.sin(tf.exp(2.0 * x + 7.0)) # Graph execution within a session with tf.Session() as session: y_val = session.run(y, {x: 1.0}) print(y_val)
  6. Visualization TensorFlow delivers a tool called TensorBoard which allows to

    monitor the execution of a computation graph, observe the changes of variable values over time. In order to use TensorBoard, an application has to be adapted to write logs into given directory. These logs can be visualized via browser application then.
  7. Extras • Saving and restoring state • Device constraints •

    Automatic gradient computation • Support for HDFS
  8. Designing AI models A majority of AI architectures can be

    described in terms of math formulas. Typically, they have some parameters, which values have to be obtained in learning phase of the model, using provided training set. The simplest architecture, and quite a good example is, as usual, a perceptron.
  9. Designing AI models from helper import get_random_sample import tensorflow as

    tf import numpy as np INPUT_SIZE = 2 DATASET_SIZE = 25 LEARNING_RATE = 0.01 # All the tensors x = tf.placeholder(shape=(INPUT_SIZE, 1), dtype=tf.float32, name="x") w = tf.Variable(tf.random_normal((INPUT_SIZE, 1)), name="weights") b = tf.Variable(1.0, name="bias") y = tf.tanh(tf.matmul(w, x, transpose_a=True) + b) target = tf.placeholder(dtype=tf.float32, name="target")
  10. Fitting We know how the computations will be done, but

    still there is a need to determine the values of model parameters - weights and bias, in the case of presented perceptron. We, of course, won’t do it manually. In our example, we created a perceptron with two inputs and just a single output. Let say, we would like to return -1 for all the blue points, and 1 for the red ones. 0 1 1
  11. Fitting A phase of fitting the model can be simply

    described as a way of obtaining the parameters which minimizes the error of its predictions. For this purposes, we need to define how to compute some kind of an error. The function calculating this error is usually called a cost function. For the purposes of our example we will use a mean squared error, defined as following:
  12. Evaluating For this simple example, we will create a training

    dataset by picking randomly the examples from the [0, 1) x [0, 1) area and assigning them a target class depending on their position to the dotted line. Model, in fitting phase, will get the vector as an input and the class as the target output and will try to adapt the parameters. To check the output of created model, we will simply try to execute it on four examples, for which we know the expected output. 0 1 1 (0.1, 0.1) (0.5, 0.5) (0.7, 0.7) (0.9, 0.9)
  13. Fitting & Evaluating # Defining cost function and the way

    how to optimize it cost = tf.squared_difference(y, target, name="cost") optimizer = tf.train.GradientDescentOptimizer( learning_rate=LEARNING_RATE).minimize(cost) # Evaluate the calculations dataset = [] for _ in range(DATASET_SIZE): sample_vector, target_value = get_random_sample() dataset.append( (np.array(sample_vector).reshape((INPUT_SIZE, 1)), target_value))
  14. Fitting & Evaluating # Fit model and run it on

    the examples init_op = tf.initialize_all_variables() with tf.Session() as session: # But initialize the variables first... session.run(init_op) for epoch in range(50000): for sample, target_value in dataset: _, epoch_cost = session.run([optimizer, cost], { x: sample, target: target_value })
  15. Results 0 1 1 -1 -1 ~-0.908 ~0.999 Our classifier

    was executed on the four examples which were shown before. The chart shows the output of created perceptron model. As may be easily checked, for red point we retrieved almost one, as well as -1 or something close to it for the blue ones.
  16. Problem definition A “CMU Face Images” dataset consists of 128x120px

    photos of 20 people. For each person there are up to 32 images taken with different pose, emotion and with or without sunglasses. Our goal will be to recognize a person from the photo using AI architecture. For the sake of laziness, we won’t even try to retrieve any features from the images, but will just try to prepare a solution working on the raw images.
  17. Suggested architecture An input layer of our network will receive

    an input image as a vector of size 128x120. The dataset contains photos of 20 different people, so there are 20 classes. The target output vector will have 1 at the position for the class that the input image belongs to, and 0, for all the others. input_vector = tf.placeholder( dtype=tf.float32, shape=(None, IMAGE_WIDTH * IMAGE_HEIGHT), name="input_vector") target_vector = tf.placeholder( dtype=tf.float32, shape=(None, CLASSES_COUNT), name="target_vector")
  18. Suggested architecture last_layer = input_vector for i in range(len(HIDDEN_LAYERS_SIZE)): #

    Create weights and biases last_layer_shape = last_layer.get_shape() weights = tf.Variable( tf.random_normal(shape=(int(last_layer_shape[1]), HIDDEN_LAYERS_SIZE[i])), name="weights_%i" % (i,)) biases = tf.Variable( tf.constant(INITIAL_BIAS, shape=(1, HIDDEN_LAYERS_SIZE[i])), name="biases_%i" % (i,)) # Create a new hidden layer and set it as a new last one last_layer = ACTIVATION_FUNCTION( tf.matmul(last_layer, weights) + biases, name="layer_%i" % (i,))
  19. Evaluating model Training dataset is built from 85% of randomly

    chosen images. The rest will be used to test the accuracy of our model. The output layer is built in the same manner like hidden ones. As a cost function we will use cross entropy, which has some useful properties: # Create cost function of the created network and optimizer cost = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits( output_vector, target_vector)) optimizer = tf.train.AdamOptimizer( learning_rate=LEARNING_RATE).minimize(cost) # Create accuracy calculation correct_prediction = tf.equal( tf.arg_max(output_vector, 1), tf.arg_max(target_vector, 1)) accuracy = tf.reduce_mean( tf.cast(correct_prediction, tf.float32))
  20. References: ❖ TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems,

    Martın Abadi et al., Google Research, 2015 ❖ https://github.com/kacperlukawski/tensorflow-introduction