Tensorflow – “Deferred Execution” Model
• Graph first. Computation Afterward.
import tensorflow as tf
x = tf.constant(10)
y = tf.Variable(x + 5)
print(y)
Slide 16
Slide 16 text
Tensorflow – “Deferred Execution” Model
• Graph first. Computation Afterward.
import tensorflow as tf
x = tf.constant(10)
y = tf.Variable(x + 5)
model = tf.global_variables_initializer()
with tf.Session() as session:
session.run(model)
print(session.run(y))
Slide 17
Slide 17 text
No content
Slide 18
Slide 18 text
No content
Slide 19
Slide 19 text
Packaging the App and the Model
Slide 20
Slide 20 text
QUANTIZATION
Compress. And Compress More.
Slide 21
Slide 21 text
Quantization
• Round it up
• Transform: round_weights
• Compression rates: ~8% => ~70%
• Shrink down node names
• Transform: obfuscate_names
• Eight bit calculations
Implementation
1. Load the model
2. Feed in the input
3. Run the model
4. Fetch the output
TensorFlowInferenceInterface inferenceInterface =
new TensorFlowInferenceInterface(assetManager, modelFile);
Slide 29
Slide 29 text
Implementation
1. Load the model
2. Feed in the input
3. Run the model
4. Fetch the output
// feed(String s, float[] floats, long… longs)
inferenceInterface.feed(inputName, floatValues, 1, inputSize, inputSize, 3);
Slide 30
Slide 30 text
Implementation
1. Load the model
2. Feed in the input
3. Run the model
4. Fetch the output
inferenceInterface.run(outputNames);
Slide 31
Slide 31 text
Implementation
1. Load the model
2. Feed in the input
3. Run the model
4. Fetch the output
// fetch(String s, float[] floats)
inferenceInterface.fetch(outputName, outputs);