A gentle introduction to TensorFlow

Slide 1

Slide 1 text

A gentle introduction to TensorFlow Kacper Łukawski

Slide 2

Slide 2 text

Main functionalities Running modes Computation graphs Tensors Application workflow Visualization Extras

Slide 3

Slide 3 text

Running modes There are three possibilities how created computations can be executed: ● CPU ● GPU (currently only CUDA, but there is a plan to support OpenCL in the TensorFlow’s roadmap) ● Distributed mode

Slide 4

Slide 4 text

Computation graphs TensorFlow uses graphs to represent math computations. Each node can be thought as a single math operation. On the right side there is a simple example of graph representation for the following equation: y = sin(exp(2x + 7)) y sin exp + X 2 x 7

Slide 5

Slide 5 text

Computation graphs - details ● Each node may have zero or more inputs and outputs represents single operation. ● Operation is an abstract computation, while a concrete implementation is called kernel. ● TensorFlow simulates graph execution and tries to find the best device that may execute a particular node. Estimated completion time and communication cost are also taken into consideration. ● To avoid redundant copies of the same values, common subexpression elimination is done. ● Values that flow along graph edges are called tensors.

Slide 6

Slide 6 text

Tensors TensorFlow delivers three basic types of tensors: ● Constants - for immutable values that do not change over time ● Variables - mutable, mainly for storing some necessary values. ● Placeholders - input parameters which computation graph needs to be fed with, each time when it is executed. import tensorflow as tf # A constant value PI = tf.constant(3.14, name='PI') # A matrix variable A = tf.Variable(tf.zeros((3, 3)), name='A') # An input placeholder x = tf.placeholder(dtype=tf.int32, name='x')

Slide 7

Slide 7 text

Application workflow Typical application using TensorFlow is divided into two parts: 1. Designing a computation graph. 2. Running the computations with the concrete numeric values. In terms of TensorFlow it is called a session. import tensorflow as tf # Graph preparation x = tf.placeholder(tf.float32, name='x') y = tf.sin(tf.exp(2.0 * x + 7.0)) # Graph execution within a session with tf.Session() as session: y_val = session.run(y, {x: 1.0}) print(y_val)

Slide 8

Slide 8 text

Visualization TensorFlow delivers a tool called TensorBoard which allows to monitor the execution of a computation graph, observe the changes of variable values over time. In order to use TensorBoard, an application has to be adapted to write logs into given directory. These logs can be visualized via browser application then.

Slide 9

Slide 9 text

Extras ● Saving and restoring state ● Device constraints ● Automatic gradient computation ● Support for HDFS

Slide 10

Slide 10 text

Machine Learning Designing AI models Fitting Evaluating With TensorFlow

Slide 11

Slide 11 text

Designing AI models A majority of AI architectures can be described in terms of math formulas. Typically, they have some parameters, which values have to be obtained in learning phase of the model, using provided training set. The simplest architecture, and quite a good example is, as usual, a perceptron.

Slide 12

Slide 12 text

Designing AI models from helper import get_random_sample import tensorflow as tf import numpy as np INPUT_SIZE = 2 DATASET_SIZE = 25 LEARNING_RATE = 0.01 # All the tensors x = tf.placeholder(shape=(INPUT_SIZE, 1), dtype=tf.float32, name="x") w = tf.Variable(tf.random_normal((INPUT_SIZE, 1)), name="weights") b = tf.Variable(1.0, name="bias") y = tf.tanh(tf.matmul(w, x, transpose_a=True) + b) target = tf.placeholder(dtype=tf.float32, name="target")

Slide 13

Slide 13 text

Fitting We know how the computations will be done, but still there is a need to determine the values of model parameters - weights and bias, in the case of presented perceptron. We, of course, won’t do it manually. In our example, we created a perceptron with two inputs and just a single output. Let say, we would like to return -1 for all the blue points, and 1 for the red ones. 0 1 1

Slide 14

Slide 14 text

Fitting A phase of fitting the model can be simply described as a way of obtaining the parameters which minimizes the error of its predictions. For this purposes, we need to define how to compute some kind of an error. The function calculating this error is usually called a cost function. For the purposes of our example we will use a mean squared error, defined as following:

Slide 15

Slide 15 text

Evaluating For this simple example, we will create a training dataset by picking randomly the examples from the [0, 1) x [0, 1) area and assigning them a target class depending on their position to the dotted line. Model, in fitting phase, will get the vector as an input and the class as the target output and will try to adapt the parameters. To check the output of created model, we will simply try to execute it on four examples, for which we know the expected output. 0 1 1 (0.1, 0.1) (0.5, 0.5) (0.7, 0.7) (0.9, 0.9)

Slide 16

Slide 16 text

Fitting & Evaluating # Defining cost function and the way how to optimize it cost = tf.squared_difference(y, target, name="cost") optimizer = tf.train.GradientDescentOptimizer( learning_rate=LEARNING_RATE).minimize(cost) # Evaluate the calculations dataset = [] for _ in range(DATASET_SIZE): sample_vector, target_value = get_random_sample() dataset.append( (np.array(sample_vector).reshape((INPUT_SIZE, 1)), target_value))

Slide 17

Slide 17 text

Fitting & Evaluating # Fit model and run it on the examples init_op = tf.initialize_all_variables() with tf.Session() as session: # But initialize the variables first... session.run(init_op) for epoch in range(50000): for sample, target_value in dataset: _, epoch_cost = session.run([optimizer, cost], { x: sample, target: target_value })

Slide 18

Slide 18 text

Results 0 1 1 -1 -1 ~-0.908 ~0.999 Our classifier was executed on the four examples which were shown before. The chart shows the output of created perceptron model. As may be easily checked, for red point we retrieved almost one, as well as -1 or something close to it for the blue ones.

Slide 19

Slide 19 text

An example Problem definition Suggested architecture Evaluating model Results Feedforward neural network

Slide 20

Slide 20 text

Problem definition A “CMU Face Images” dataset consists of 128x120px photos of 20 people. For each person there are up to 32 images taken with different pose, emotion and with or without sunglasses. Our goal will be to recognize a person from the photo using AI architecture. For the sake of laziness, we won’t even try to retrieve any features from the images, but will just try to prepare a solution working on the raw images.

Slide 21

Slide 21 text

Problem definition

Slide 22

Slide 22 text

Suggested architecture

Slide 23

Slide 23 text

Suggested architecture An input layer of our network will receive an input image as a vector of size 128x120. The dataset contains photos of 20 different people, so there are 20 classes. The target output vector will have 1 at the position for the class that the input image belongs to, and 0, for all the others. input_vector = tf.placeholder( dtype=tf.float32, shape=(None, IMAGE_WIDTH * IMAGE_HEIGHT), name="input_vector") target_vector = tf.placeholder( dtype=tf.float32, shape=(None, CLASSES_COUNT), name="target_vector")

Slide 24

Slide 24 text

Suggested architecture last_layer = input_vector for i in range(len(HIDDEN_LAYERS_SIZE)): # Create weights and biases last_layer_shape = last_layer.get_shape() weights = tf.Variable( tf.random_normal(shape=(int(last_layer_shape[1]), HIDDEN_LAYERS_SIZE[i])), name="weights_%i" % (i,)) biases = tf.Variable( tf.constant(INITIAL_BIAS, shape=(1, HIDDEN_LAYERS_SIZE[i])), name="biases_%i" % (i,)) # Create a new hidden layer and set it as a new last one last_layer = ACTIVATION_FUNCTION( tf.matmul(last_layer, weights) + biases, name="layer_%i" % (i,))

Slide 25

Slide 25 text

Evaluating model Training dataset is built from 85% of randomly chosen images. The rest will be used to test the accuracy of our model. The output layer is built in the same manner like hidden ones. As a cost function we will use cross entropy, which has some useful properties: # Create cost function of the created network and optimizer cost = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits( output_vector, target_vector)) optimizer = tf.train.AdamOptimizer( learning_rate=LEARNING_RATE).minimize(cost) # Create accuracy calculation correct_prediction = tf.equal( tf.arg_max(output_vector, 1), tf.arg_max(target_vector, 1)) accuracy = tf.reduce_mean( tf.cast(correct_prediction, tf.float32))

Slide 26

Slide 26 text

Accuracy: 70.2128%

Slide 27

Slide 27 text

Questions & Answers

Slide 28

Slide 28 text

References: ❖ TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems, Martın Abadi et al., Google Research, 2015 ❖ https://github.com/kacperlukawski/tensorflow-introduction