ML on Mobile DroidKaigi 2018

Ready for AI? ML on Moble » Machine Learning &
TensorFlow » Build your own model and optimize it DroidKaigi 2018 @rejasupotaro 1

» Kentaro Takiguchi / @rejasupotaro » Software Engineer » Android
(since 2010) » ! Web » ! Search » ! ML » Discovery team at Cookpad UK 2

Discovery = Search + Recommendation Provide users with the right
contents at the right time 3

Deep Learning Achievements Over The Past Year https://blog.statsbot.co/deep-learning-achievements-4c563e034257 4

Artiﬁcal Intelligence, Blockchain, IoT, VR, Serverless architecture, ... 5

Best Apps in 2017 Socratic, Pinterest, Faceapp 6

Google Flights will now predict airline delays before the airlines
do 7

One Day » When I send a resume to a
company, AI predicts my future performance, then they reject without having an interview 8

Another Day » When I send a resume to a
company, AI predicts my future performance, then they reject without having an interview » When I attempt to buy an insurance, AI detects a risk of my future illness, then they reject my application 9

Another Day » When I send a resume to a
company, AI predicts my future performance, then they reject without having an interview » When I attempt to buy an insurance, AI detects a risk of my future illness, then they reject my application » When I pass in front of a surveillance camera, AI detects that I will commit a crime, then I'm arrested 10

! Sounds fun ! 11

! This is the talk for you » ✅ "I'm
just interested in AI" » Welcome! » ✅ "I've investigated the feasibility. I want to release" » This talk focuses on what's happening inside » You would be able to release your app with conﬁdence 12

API development is matured Server: "Call this API with the
param" GET /users?id=1 Client: "Okay" » We get to know how to use it by just looking at this one line 13

Productioning ML has just started ML: "I trained a model.
Use this" awesome_model.pb Client: ??? » Machine learning researchers need to know the real environment » Product developers need to learn machine learning 14

My Motivation It's like Android development around 2012 (Everything was
difﬁcult...) I feel we need to share knowledge. ! I want to share what I learned. 15

Build This AI App Together https://github.com/rejasupotaro/logic-gate 16

Why TensorFlow? » There are many libraries such as PyTorch,
Caffe2, Theano, Chainer, ... 17

Runtime We would ❤ the ofﬁcial way Android iOS TensorFlow
Deeplearning4j Core ML TensorFlow ” Let's ask if the library they love supports these platforms 18

Train ! (IR) ! Inference 19

What is TensorFlow? “TensorFlow is an interface for expressing machine
learning algorithms, and an implementation for executing such algorithms, focusing on a wide variety of heterogeneous systems, ranging from mobile devices up to large-scale distributed systems of hundred machines and thousands of computational devices” https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45166.pdf 20

What is TensorFlow? https://www.tensorﬂow.org/programmers_guide/graphs 21

What is TensorFlow? import tensorﬂow as tf a = tf.constant(2)
b = tf.constant(3) with tf.Session() as session: print(session.run(a + b)) # => 5 22

Why TensorFlow? » There are many libraries such as PyTorch,
Caffe2, Theano, Chainer, ... » ! Most of libraries are designed for research, but TensorFlow is designed to run models in production, for various environments including mobile. 23

Architecture of TensorFlow 24

TensorFlow Mobile & Lite 25

Arhictecture of TensorFlow Lite https://developer.android.com/ndk/guides/neuralnetworks/index.html 26

Arm support for Android NNAPI gives >4x performance boost https://arxiv.org/pdf/1801.06274.pdf
27

TensorFlow Mobile (17.2 MB) TensorFlow Lite (1.5 MB) 28

TensorFlow Mobile & Lite TensorFlow Mobile TensorFlow Lite 1.4.0 0.1.1
» TensorFlow Mobile » = TensorFlow Java + Android » TensorFlow Lite » = Minimum implementation optimized for mobile and embedded devices as of 2018/02/01 29

“Google will make Android the best platform for machine learning”
ૉ੖Β͍͠NDKͷੈք / Building High Performance Android Apps with NDK ʵ 2018/02/08 15:40-16:30 30

Let's See How to run a trained model 31

Run a trained model » Add the dependency. implementation "org.tensorflow:tensorflow-android:1.4.0"
» Put a model in src/main/assets » Run the computation graph inferenceInterface.feed(inputName, input, shape.first, shape.second) inferenceInterface.run(arrayOf(outputName)) inferenceInterface.fetch(outputName, output) » Give [0, 1] ! Get [0] 32

That's it! » It can be done within 5 min
if I have a trained model » You don't need to write C++. You don't need to use CMake. Just use the library. 33

Logic AND Gate Input Output 0, 0 0 0, 1
0 1, 0 0 1, 1 1 34

» , : input » : output » , :
variables » Fomula: 35

Let's deﬁne a graph 36

x = tf.placeholder(tf.ﬂoat32, shape=[None, 2], name='x') y = tf.placeholder(tf.ﬂoat32, shape=[None,
1], name='y') w = tf.Variable(tf.zeros([2, 1]), name='weight') b = tf.Variable(tf.zeros([1]), name='bias') y_pred = tf.nn.sigmoid(tf.matmul(x, w) + bias, name='y_pred') with tf.name_scope("loss"): loss = tf.reduce_sum(tf.square(y_pred - y), name='loss') with tf.name_scope("train"): optimizer = tf.train.AdamOptimizer(learning_rate=0.1, name='optimizer') train_step = optimizer.minimize(loss, name='train_step') with tf.Session() as session: session.run(tf.global_variables_initializer()) for epoch in range(self.args.epochs): _, summary, l = session.run( [train_step, merged, loss], feed_dict={ x: input, y: output } ) 37

Training » Deﬁne the problem (inputs and expected outputs) »
Build the computation graph » : tf.placeholder » , : tf.Variable » : tf.nn.sigmoid(tf.matmul(x, w) + b) » Find the appropriate values by iterating training steps 40

Visualizing a Model 41

Congrats! ! » Once you understand how model is built,
you can build any models (by modifying the graph a little) » You don't need to hardcode the logic. Machine learns how to do it instead of you. 42

Hardcode Logic fun predict(x1: Int, x2: Int): Int { return
if (x1 == 0 && x2 == 0) { 0 } else if (x1 == 1 && x2 == 0) { 0 } else if (x1 == 0 && x2 == 1) { 0 } else if (x1 == 1 && x2 == 1) { 1 } else { throw RuntimeException("I hope it won't happen") } } ” List all possible cases 43

Machine Learns // [[0, 0], [0, 1], [1, 0], [1,
1]] fun predict(x1: Int, x2: Int): Int { ??? // Machine learns what value to return } // => [[0], [0], [0], [1]] ” You don't need to write the logic 44

XOR Gate // [[0, 0], [0, 1], [1, 0], [1,
1]] fun predict(x1: Int, x2: Int): Int { ??? // Machine learns what value to return } // => [[0], [1], [1], [0]] Nest more layers when we work on a complex problem 45

Estimate Weight // [170cm, Man, ...] fun predict(...): Int {
??? // Machine learns what value to return } // => 70kg 46

Translation » "I am learning machine learning" » ! ["i",
"am", "learning", "machine", "learning"] » ! [0.53177921, 0.7965138, 0.66959208, ...] » ! ["ࢲ", "͸", "ػցֶश", "Λ", "ษڧ", "͠", "ͯ", "͍", "·͢"] » ! "ࢲ͸ػցֶशΛษڧ͍ͯ͠·͢" 47

Image Classiﬁcation » [#000000, #00001A, ...] » ! [[0, 0,
0], [0, 0, 0.1], ...] » ! "Cat" or "Burrito" 48

Deploy to Production » Optimize Model » Deploy Model 49

Practically, we don't run such a graph in production 50

Unnecessary nodes + No need to be variables 51

Training » Deﬁne a problem (inputs and expected outputs) »
Build a computation graph » Find the appropriate values by iterating training steps Inference » Get the predicted value 52

Why we need to transform graph 1.To enable the model
to run on mobile » Supported operations are limited 2.For better performance » Computer resources are restricted » Shouldn't drain users' phone battery 53

1. To enable the model to run on mobile $
find ./tensorflow/python/ -type f | grep '.py$' | wc -l 968 $ find ./tensorflow/java/ -type f | grep '.java$' | wc -l 42 Feature Python Java Run a predefined Graph tf.import_graph_def, tf.Session TensorFlowInferenceInterface Graph construction Yes Gradients tf.gradients Functions tf.python.framework.function.Defun Control Flow tf.cond, tf.while_loop Neural Network library tf.train, tf.nn, tf.contrib.layers, tf.contrib.slim 54

2. For better performance Pixel 2 XL (my phone) Oppo
A37f (most used) Chipset Qualcomm MSM8996 Snapdragon 821 Qualcomm MSM8916 Snapdragon 410 CPU Quad-core (2x2.15 GHz Kryo & 2x1.6 GHz Kryo) Quad-core 1.2 GHz Cortex-A53 GPU Adreno 530 Adreno 306 RAM 4 GB 2 GB There is a big gap between developers and users 55

Graph Transformation Model Size !, Persormance ", (Accuracy !) 56

Freeze: Variable to Constant node { name: "x" op: "Placeholder"
... } node { name: "bias" op: "VariableV2" attr { key: "container" value { s: "" } } attr { key: "shape" value { shape { dim { size: 1 } } } } ... 57

Freeze: Variable to Constant » ! Train » ! Checkpoint
file is created » ! Freeze $ bazel build tensorflow/python/tools:optimize_for_inference $ bazel-bin/tensorflow/python/tools/optimize_for_inference \ --input=../logic-gate/logic-gate-python/models/and.pb \ --output=../logic-gate/logic-gate-python/models/optimized_and.pb \ --frozen_graph=True \ --input_names=x \ --output_names=y_pred 58

Freeze: Variable to Constant node { name: "x" op: "Placeholder"
... } node { name: "bias" op: "Const" attr { key: "value" value { tensor { dtype: DT_FLOAT tensor_shape { dim { size: 1 } } float_val: -15.988138198852539 } } } ... bias became Const with a fixed float value 59

Before (Training) 60

After (Inference) 61

Performance » Computational Cost » Model Size 62

Pre-trained Models for Image Classiﬁcation » Inception, AlexNet, MobileNet, ...
63

Model Comparison » Retrained Inception-v3, Retrained MobileNet, PatchNet (Modiﬁed Inception-v3)
» Specialized for food 64

https://speakerdeck.com/lunardog/cooking-with-food-photos 65

MobileNet ! PatchNet 66

Performance Comparison Retrained Inception-v3 Retrained MobileNet PatchNet (Modiﬁed Inception-v3) Model
Size 77.4 MB 11.6 MB 85.9 MB Execution Time 05m 44s 166ms 01m 19s 535ms 18m 34s 624ms CPU Usage 80% 50% 80% Memory Usage 500 MB 150 MB 500 MB 67

Graph Complexity » ≈ Computational Cost » ≈ Model Size
69

Inception-v3, MobileNet, PatchNet 70

Benchmark $ bazel build tensorﬂow/tools/benchmark/benchmark_model $ bazel-bin/tensorﬂow/tools/benchmark/benchmark_model \ --graph=... \
--input_layer=... \ --input_layer_shape=... \ --input_layer_type=... \ --output_layer=... \ ... 71

Benchmark Retrained Inception-v3 Retrained MobileNet PatchNet Number of nodes executed
507 264 704 AVG Timings (microseconds) 368284 35551.5 1.28228e+06 Actual Execution Time 05m 44s 166ms 01m 19s 535ms 18m 34s 624ms How much faster than PatchNet 3x faster 18x faster - 72

Graph Transformation $ bazel build tensorflow/tools/graph_transforms:transform_graph $ bazel-bin/tensorflow/tools/graph_transforms/transform_graph \ --in_graph=../models/mobilenet.pb
\ --out_graph=../models/optimized-mobilenet.pb \ --inputs='input_1:0' \ --outputs='final:0' \ --transforms=' strip_unused_nodes(type=float, shape="1,224,224,3") remove_nodes(op=Identity, op=CheckNumerics) fold_constants(ignore_errors=true) fold_batch_norms fold_old_batch_norms quantize_weights sort_by_execution_order' 73

Graph Transformation Original quantize_weights More Options Model Size 12 MB
3.3 MB 3.1 MB Accuracy 0.97 0.97 0.9633 Benchmark 01m 19s 535ms 01m 17s 716ms 01m 13s 816ms CPU Usage 40-60% 40-60% 40-60% Memory Usage 120 MB 90 MB 90 MB Number of nodes executed 264 264 236 ” MobileNet 74

Accuracy Basically, accuracy should be the same if we run
the same computation graph with the same inputs but... » Computation Graph » Transformed for mobile » Data preprocessing » Vectorization » Feature Scaling 75

Feature Scaling Make sure that features are on the same
scale. ! One particular feature which has a broad range of values governs the result " 76

Normalization (Number) Rescaling Mean normalization Standardization 77

Normalization (Color) val pixels: IntArray ! How to process this
data? 78

Load from Asset Manager / File constructor(assets: AssetManager) { this.inferenceInterface
= TensorFlowInferenceInterface(assets, modelName) } constructor(ﬁle: File) { this.inferenceInterface = ﬁle.inputStream().use { TensorFlowInferenceInterface(it) } } 79

Still worry about app size? » Put the model in
assets (bundling) » Pros: No need to manage versions » Cons: Increased app size » Download a model into a ﬁle » Pros: Reduced app size » Cons: Need to manage versions 80

Still worry about library size? » Splitting APK android {
... splits { abi { enable true reset() include 'x86_64', 'x86', 'arm64-v8a', 'armeabi-v7a', 'arm64-v8a' } } project.ext.abiCodes = ['x86_64': 1, 'x86': 2, 'arm64-v8a': 3, 'armeabi-v7a': 4].withDefault { 0 } } android.applicationVariants.all { variant -> variant.outputs.each { output -> def baseAbiVersionCode = project.ext.abiCodes.get(output.getFilter(OutputFile.ABI)) if (baseAbiVersionCode != null) { output.versionCodeOverride = baseAbiVersionCode * 1000 + variant.versionCode } } } 81

Asynchronous download + Splitting APK ! Get power of AI
by just adding 345 MB 82

[Summary] » How to train a model » Difference between
TensorFlow Mobile and Lite » How to transform a graph ! Freeze/Optimize/ benchmark » How to implement client-side » Model is too big? ! Consider asyncronous download » APK still is too big? ! Split APK 83

Thank you for listening! Any questions? ! @rejasupotaro 84

ML on Mobile DroidKaigi 2018

ML on Mobile DroidKaigi 2018

More Decks by rejasupotaro

Other Decks in Programming

Featured

Transcript