New APIs in Tensorflow - Speaker Deck

Slide 1

Slide 1 text

New Features in TensorFlow Sourabh Bajaj Software Engineer, Google Brain @sb2nov

Slide 2

Slide 2 text

Features in Active Development Everything Subject to Change Please do not tweet/blog/publicize these slides :)

Slide 3

Slide 3 text

+ TensorFlow Eager Mode + TensorFlow Datasets

Slide 4

Slide 4 text

TensorFlow Eager Execution As simple as possible

Slide 5

Slide 5 text

Graphs

Slide 6

Slide 6 text

Delayed feedback ● Error reporting much after graph construction ● Not friendly to host-language debugger/tools Graphs can be annoying

Slide 7

Slide 7 text

Metaprogramming ● Control flow concepts (tf.while_loop) are different than the host language ● Can’t use Python data structures easily Graphs can be annoying

Slide 8

Slide 8 text

x = tf.placeholder(tf.float32, shape=[1, 1]) m = tf.matmul(x, x) print(m) # Tensor("MatMul:0", shape=(1, 1), dtype=float32) with tf.Session() as sess: m_out = sess.run(m, feed_dict={x: [[2.]]}) print(m_out) # [[4.]] Boilerplate Code like this...

Slide 9

Slide 9 text

x = [[2.]] m = tf.matmul(x, x) print(m) # tf.Tensor([[4.]], dtype=float32, shape=(1,1)) Boilerplate Becomes this

Slide 10

Slide 10 text

x = tf.gather([0, 1, 2], 7) InvalidArgumentError: indices = 7 is not in [0, 3) [Op:Gather] Instant Errors

Slide 11

Slide 11 text

x = tf.random_uniform([2, 2]) with tf.Session() as sess: for i in range(x.shape[0]): for j in range(x.shape[1]): print(sess.run(x[i, j])) Metaprogramming Each iteration adds nodes to the graph

Slide 12

Slide 12 text

x = tf.random_uniform([2, 2]) for i in range(x.shape[0]): for j in range(x.shape[1]): print(x[i, j]) Metaprogramming

Slide 13

Slide 13 text

a = tf.constant(6) while a != 1: if a % 2 == 0: a = a / 2 else: a = 3 * a + 1 print(a) Python Control Flow # Outputs tf.Tensor(3, dtype=int32) tf.Tensor(10, dtype=int32) tf.Tensor(5, dtype=int32) tf.Tensor(16, dtype=int32) tf.Tensor(8, dtype=int32) tf.Tensor(4, dtype=int32) tf.Tensor(2, dtype=int32) tf.Tensor(1, dtype=int32)

Slide 14

Slide 14 text

Gradients

Slide 15

Slide 15 text

● Operations executed are recorded on a tape ● Tape is played back to compute gradients Gradients

Slide 16

Slide 16 text

def square(x): return tf.multiply(x, x) # Or x * x grad = tfe.gradients_function(square) print(square(3.)) # tf.Tensor(9., dtype=tf.float32 print(grad(3.)) # [tf.Tensor(6., dtype=tf.float32))] Gradients

Slide 17

Slide 17 text

def square(x): return tf.multiply(x, x) # Or x * x grad = tfe.gradients_function(square) gradgrad = tfe.gradients_function(lambda x: grad(x)[0]) print(square(3.)) # tf.Tensor(9., dtype=tf.float32) print(grad(3.)) # [tf.Tensor(6., dtype=tf.float32)] print(gradgrad(3.)) # [tf.Tensor(2., dtype=tf.float32))] Gradients

Slide 18

Slide 18 text

It’s not that different

Slide 19

Slide 19 text

TensorFlow = Operation Kernels + Composition ● Session: One way to compose operations ● Eager execution: Compose using Python

Slide 20

Slide 20 text

tf.device() for manual placement with tf.device(“/gpu:0”): x = tf.random_uniform([10, 10]) y = tf.matmul(x, x) # x and y reside in GPU memory Using GPUs

Slide 21

Slide 21 text

The same APIs as graph building (tf.layers, tf.train.Optimizer, tf.data etc.) model = tf.layers.Dense(units=1, use_bias=True) optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.1) Building Models

Slide 22

Slide 22 text

model = tf.layers.Dense(units=1, use_bias=True) optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.1) # Define a loss function def loss(x, y): return tf.reduce_mean(tf.square(y - model(x)) Building Models

Slide 23

Slide 23 text

Compute and apply gradients for (x, y) in get_next_batch(): optimizer.apply_gradients(grad_fn(x, y)) Training Models

Slide 24

Slide 24 text

Compute and apply gradients grad_fn = tfe.implicit_gradients(loss) for (x, y) in get_next_batch(): optimizer.apply_gradients(grad_fn(x, y)) Training Models

Slide 25

Slide 25 text

No more graphs then?

Slide 26

Slide 26 text

Optimizable ● Automatic buffer reuse ● Constant folding ● Inter-op parallelism ● Automatic trade-off between compute and memory Graphs are

Slide 27

Slide 27 text

Deployable ● TensorFlow Serving ● Mobile ● Any other C++/Java/other program Without loss in translation between runtimes Graphs are

Slide 28

Slide 28 text

Transformable ● Carve out subgraphs to offload to accelerators ● Train with quantization in mind Graphs are

Slide 29

Slide 29 text

Graph Functions

Slide 30

Slide 30 text

“Compile” Python functions into graphs ● Mix eager execution with calls to “compiled” graphs ● Differentiate through graphs Graph Functions

Slide 31

Slide 31 text

def lstm_cell(x, w, h, c): xhw = tf.matmul(tf.concat([x, h], axis=1), w) y = tf.split(xhw, 4, axis=1) in_value = tf.tanh(y[0]) in_gate, forget_gate, out_gate = [tf.sigmoid(x) for x in y[1:]] c = (forget_gate * c) + (in_gate * in_value) h = out_gate * tf.tanh(c) return h, c h, c = lstm_cell(x, w, h, c) print(h) LSTM Cell

Slide 32

Slide 32 text

@tfe.graph_function def lstm_cell(x, w, h, c): xhw = tf.matmul(tf.concat([x, h], axis=1), w) y = tf.split(xhw, 4, axis=1) in_value = tf.tanh(y[0]) in_gate, forget_gate, out_gate = [tf.sigmoid(x) for x in y[1:]] c = (forget_gate * c) + (in_gate * in_value) h = out_gate * tf.tanh(c) return h, c h, c = lstm_cell(x, w, h, c) print(h) LSTM Cell

Slide 33

Slide 33 text

Slide 34

Slide 34 text

@tfe.graph_function def inception(image): logits = inception.inception_v3(image, num_classes=1001, is_training=False)[0] return tf.softmax(logits) inception.restore(“/path/to/checkpoint”) print(len(inception.variables)) Use existing graph code

Slide 35

Slide 35 text

No content

Slide 36

Slide 36 text

TensorFlow Datasets Data Infeed Made Simple

Slide 37

Slide 37 text

Input data is the lifeblood of machine learning Modern accelerators need faster input pipelines Getting your data into TensorFlow can be painful Why are we here?

Slide 38

Slide 38 text

Half screen photo slide if text is necessary Feeding sess.run(…, feed_dict={x: features, y: labels}) All the flexibility of Python… …and all the performance

Slide 39

Slide 39 text

Queues files = string_input_producer(…) record = TFRecordReader().read(files) parsed = parse_example(record, …) batch = batch(parsed, 32) Uses TensorFlow ops to perform preprocessing, but driven by client threads. “Starting the queue runners”

Slide 40

Slide 40 text

No content

Slide 41

Slide 41 text

How do I switch between training and validation data? How do I detect the end of an epoch? How do I handle malformed data?

Slide 42

Slide 42 text

Data elements have the same type Dataset might be too large to materialize all at once… or infinite Compose functions like map() and filter() to preprocess Input pipelines = lazy lists Functional programming to the rescue!

Slide 43

Slide 43 text

A well-studied area, applied in existing languages. ● C# LINQ, Scala collections, Java Streams Huge literature on optimization (stream fusion etc.) Input pipelines = lazy lists Functional programming to the rescue!

Slide 44

Slide 44 text

Introducing tf.data Functional input pipelines in TensorFlow

Slide 45

Slide 45 text

Create a Dataset from one or more tf.Tensor objects: Dataset.from_tensors((features, labels)) Dataset.from_tensor_slices((features, labels)) TextLineDataset(filenames) The Dataset interface Data sources and functional transformations

Slide 46

Slide 46 text

Or create a Dataset from another Dataset: dataset.map(lambda x: tf.decode_jpeg(x)) dataset.repeat(NUM_EPOCHS) dataset.batch(BATCH_SIZE) ...and many more. The Dataset interface Data sources and functional transformations

Slide 47

Slide 47 text

Or (in TensorFlow 1.4) create a Dataset from a Python generator: def generator(): while True: yield ... Dataset.from_generator(generator, tf.int32) The Dataset interface Data sources and functional transformations

Slide 48

Slide 48 text

# Read records from a list of files. dataset = TFRecordDataset(["file1.tfrecord", "file1.tfrecord", …]) # Parse string values into tensors. dataset = dataset.map(lambda record: tf.parse_single_example(record, …)) # Randomly shuffle using a buffer of 10000 examples. dataset = dataset.shuffle(10000) # Repeat for 100 epochs. dataset = dataset.repeat(100) # Combine 128 consecutive elements into a batch. dataset = dataset.batch(128)

Slide 49

Slide 49 text

Create an Iterator from a Dataset: dataset.make_one_shot_iterator() dataset.make_initializable_iterator() The Iterator interface Sequential access to Dataset elements

Slide 50

Slide 50 text

Get the next element from the Iterator: next_element = iterator.get_next() while …: sess.run(next_element) The Iterator interface Sequential access to Dataset elements

Slide 51

Slide 51 text

dataset = … # A one-shot iterator automatically initializes itself on first use. iterator = dataset.make_one_shot_iterator() # The return value of get_next() matches the dataset element type. images, labels = iterator.get_next() train_op = model_and_optimizer(images, labels) # Loop until all elements have been consumed. try: while True: sess.run(train_op) except tf.errors.OutOfRangeError: pass

Slide 52

Slide 52 text

def input_fn(): dataset = … # A one-shot iterator automatically initializes itself on first use. iterator = dataset.make_one_shot_iterator() # The return value of get_next() matches the dataset element type. images, labels = iterator.get_next() return images, labels # The input_fn can be used as a regular Estimator input function. estimator = tf.estimator.Estimator(…) estimator.train(train_input_fn=input_fn, …)

Slide 53

Slide 53 text

tf.data.Dataset Represents input pipeline using functional transformations tf.data.Iterator Provides sequential access to elements of a Dataset tf.data API

Slide 54

Slide 54 text

Thank You Reach out at @sb2nov for questions