ML on Mobile DroidKaigi 2018

Slide 1

Slide 1 text

Ready for AI? ML on Moble » Machine Learning & TensorFlow » Build your own model and optimize it DroidKaigi 2018 @rejasupotaro 1

Slide 2

Slide 2 text

» Kentaro Takiguchi / @rejasupotaro » Software Engineer » Android (since 2010) » ! Web » ! Search » ! ML » Discovery team at Cookpad UK 2

Slide 3

Slide 3 text

Discovery = Search + Recommendation Provide users with the right contents at the right time 3

Slide 4

Slide 4 text

Deep Learning Achievements Over The Past Year https://blog.statsbot.co/deep-learning-achievements-4c563e034257 4

Slide 5

Slide 5 text

Artiﬁcal Intelligence, Blockchain, IoT, VR, Serverless architecture, ... 5

Slide 6

Slide 6 text

Best Apps in 2017 Socratic, Pinterest, Faceapp 6

Slide 7

Slide 7 text

Google Flights will now predict airline delays before the airlines do 7

Slide 8

Slide 8 text

One Day » When I send a resume to a company, AI predicts my future performance, then they reject without having an interview 8

Slide 9

Slide 9 text

Slide 10

Slide 10 text

Another Day » When I send a resume to a company, AI predicts my future performance, then they reject without having an interview » When I attempt to buy an insurance, AI detects a risk of my future illness, then they reject my application » When I pass in front of a surveillance camera, AI detects that I will commit a crime, then I'm arrested 10

Slide 11

Slide 11 text

! Sounds fun ! 11

Slide 12

Slide 12 text

! This is the talk for you » ✅ "I'm just interested in AI" » Welcome! » ✅ "I've investigated the feasibility. I want to release" » This talk focuses on what's happening inside » You would be able to release your app with conﬁdence 12

Slide 13

Slide 13 text

API development is matured Server: "Call this API with the param" GET /users?id=1 Client: "Okay" » We get to know how to use it by just looking at this one line 13

Slide 14

Slide 14 text

Productioning ML has just started ML: "I trained a model. Use this" awesome_model.pb Client: ??? » Machine learning researchers need to know the real environment » Product developers need to learn machine learning 14

Slide 15

Slide 15 text

My Motivation It's like Android development around 2012 (Everything was difﬁcult...) I feel we need to share knowledge. ! I want to share what I learned. 15

Slide 16

Slide 16 text

Build This AI App Together https://github.com/rejasupotaro/logic-gate 16

Slide 17

Slide 17 text

Why TensorFlow? » There are many libraries such as PyTorch, Caffe2, Theano, Chainer, ... 17

Slide 18

Slide 18 text

Runtime We would ❤ the ofﬁcial way Android iOS TensorFlow Deeplearning4j Core ML TensorFlow ” Let's ask if the library they love supports these platforms 18

Slide 19

Slide 19 text

Train ! (IR) ! Inference 19

Slide 20

Slide 20 text

What is TensorFlow? “TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms, focusing on a wide variety of heterogeneous systems, ranging from mobile devices up to large-scale distributed systems of hundred machines and thousands of computational devices” https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45166.pdf 20

Slide 21

Slide 21 text

What is TensorFlow? https://www.tensorﬂow.org/programmers_guide/graphs 21

Slide 22

Slide 22 text

What is TensorFlow? import tensorﬂow as tf a = tf.constant(2) b = tf.constant(3) with tf.Session() as session: print(session.run(a + b)) # => 5 22

Slide 23

Slide 23 text

Why TensorFlow? » There are many libraries such as PyTorch, Caffe2, Theano, Chainer, ... » ! Most of libraries are designed for research, but TensorFlow is designed to run models in production, for various environments including mobile. 23

Slide 24

Slide 24 text

Architecture of TensorFlow 24

Slide 25

Slide 25 text

TensorFlow Mobile & Lite 25

Slide 26

Slide 26 text

Arhictecture of TensorFlow Lite https://developer.android.com/ndk/guides/neuralnetworks/index.html 26

Slide 27

Slide 27 text

Arm support for Android NNAPI gives >4x performance boost https://arxiv.org/pdf/1801.06274.pdf 27

Slide 28

Slide 28 text

TensorFlow Mobile (17.2 MB) TensorFlow Lite (1.5 MB) 28

Slide 29

Slide 29 text

TensorFlow Mobile & Lite TensorFlow Mobile TensorFlow Lite 1.4.0 0.1.1 » TensorFlow Mobile » = TensorFlow Java + Android » TensorFlow Lite » = Minimum implementation optimized for mobile and embedded devices as of 2018/02/01 29

Slide 30

Slide 30 text

“Google will make Android the best platform for machine learning” ૉ੖Β͍͠NDKͷੈք / Building High Performance Android Apps with NDK ʵ 2018/02/08 15:40-16:30 30

Slide 31

Slide 31 text

Let's See How to run a trained model 31

Slide 32

Slide 32 text

Run a trained model » Add the dependency. implementation "org.tensorflow:tensorflow-android:1.4.0" » Put a model in src/main/assets » Run the computation graph inferenceInterface.feed(inputName, input, shape.first, shape.second) inferenceInterface.run(arrayOf(outputName)) inferenceInterface.fetch(outputName, output) » Give [0, 1] ! Get [0] 32

Slide 33

Slide 33 text

That's it! » It can be done within 5 min if I have a trained model » You don't need to write C++. You don't need to use CMake. Just use the library. 33

Slide 34

Slide 34 text

Logic AND Gate Input Output 0, 0 0 0, 1 0 1, 0 0 1, 1 1 34

Slide 35

Slide 35 text

» , : input » : output » , : variables » Fomula: 35

Slide 36

Slide 36 text

Let's deﬁne a graph 36

Slide 37

Slide 37 text

x = tf.placeholder(tf.ﬂoat32, shape=[None, 2], name='x') y = tf.placeholder(tf.ﬂoat32, shape=[None, 1], name='y') w = tf.Variable(tf.zeros([2, 1]), name='weight') b = tf.Variable(tf.zeros([1]), name='bias') y_pred = tf.nn.sigmoid(tf.matmul(x, w) + bias, name='y_pred') with tf.name_scope("loss"): loss = tf.reduce_sum(tf.square(y_pred - y), name='loss') with tf.name_scope("train"): optimizer = tf.train.AdamOptimizer(learning_rate=0.1, name='optimizer') train_step = optimizer.minimize(loss, name='train_step') with tf.Session() as session: session.run(tf.global_variables_initializer()) for epoch in range(self.args.epochs): _, summary, l = session.run( [train_step, merged, loss], feed_dict={ x: input, y: output } ) 37

Slide 38

Slide 38 text

Slide 39

Slide 39 text

Slide 40

Slide 40 text

Training » Deﬁne the problem (inputs and expected outputs) » Build the computation graph » : tf.placeholder » , : tf.Variable » : tf.nn.sigmoid(tf.matmul(x, w) + b) » Find the appropriate values by iterating training steps 40

Slide 41

Slide 41 text

Visualizing a Model 41

Slide 42

Slide 42 text

Congrats! ! » Once you understand how model is built, you can build any models (by modifying the graph a little) » You don't need to hardcode the logic. Machine learns how to do it instead of you. 42

Slide 43

Slide 43 text

Hardcode Logic fun predict(x1: Int, x2: Int): Int { return if (x1 == 0 && x2 == 0) { 0 } else if (x1 == 1 && x2 == 0) { 0 } else if (x1 == 0 && x2 == 1) { 0 } else if (x1 == 1 && x2 == 1) { 1 } else { throw RuntimeException("I hope it won't happen") } } ” List all possible cases 43

Slide 44

Slide 44 text

Machine Learns // [[0, 0], [0, 1], [1, 0], [1, 1]] fun predict(x1: Int, x2: Int): Int { ??? // Machine learns what value to return } // => [[0], [0], [0], [1]] ” You don't need to write the logic 44

Slide 45

Slide 45 text

XOR Gate // [[0, 0], [0, 1], [1, 0], [1, 1]] fun predict(x1: Int, x2: Int): Int { ??? // Machine learns what value to return } // => [[0], [1], [1], [0]] Nest more layers when we work on a complex problem 45

Slide 46

Slide 46 text

Estimate Weight // [170cm, Man, ...] fun predict(...): Int { ??? // Machine learns what value to return } // => 70kg 46

Slide 47

Slide 47 text

Translation » "I am learning machine learning" » ! ["i", "am", "learning", "machine", "learning"] » ! [0.53177921, 0.7965138, 0.66959208, ...] » ! ["ࢲ", "͸", "ػցֶश", "Λ", "ษڧ", "͠", "ͯ", "͍", "·͢"] » ! "ࢲ͸ػցֶशΛษڧ͍ͯ͠·͢" 47

Slide 48

Slide 48 text

Image Classiﬁcation » [#000000, #00001A, ...] » ! [[0, 0, 0], [0, 0, 0.1], ...] » ! "Cat" or "Burrito" 48

Slide 49

Slide 49 text

Deploy to Production » Optimize Model » Deploy Model 49

Slide 50

Slide 50 text

Practically, we don't run such a graph in production 50

Slide 51

Slide 51 text

Unnecessary nodes + No need to be variables 51

Slide 52

Slide 52 text

Training » Deﬁne a problem (inputs and expected outputs) » Build a computation graph » Find the appropriate values by iterating training steps Inference » Get the predicted value 52

Slide 53

Slide 53 text

Why we need to transform graph 1.To enable the model to run on mobile » Supported operations are limited 2.For better performance » Computer resources are restricted » Shouldn't drain users' phone battery 53

Slide 54

Slide 54 text

1. To enable the model to run on mobile $ find ./tensorflow/python/ -type f | grep '.py$' | wc -l 968 $ find ./tensorflow/java/ -type f | grep '.java$' | wc -l 42 Feature Python Java Run a predefined Graph tf.import_graph_def, tf.Session TensorFlowInferenceInterface Graph construction Yes Gradients tf.gradients Functions tf.python.framework.function.Defun Control Flow tf.cond, tf.while_loop Neural Network library tf.train, tf.nn, tf.contrib.layers, tf.contrib.slim 54

Slide 55

Slide 55 text

2. For better performance Pixel 2 XL (my phone) Oppo A37f (most used) Chipset Qualcomm MSM8996 Snapdragon 821 Qualcomm MSM8916 Snapdragon 410 CPU Quad-core (2x2.15 GHz Kryo & 2x1.6 GHz Kryo) Quad-core 1.2 GHz Cortex-A53 GPU Adreno 530 Adreno 306 RAM 4 GB 2 GB There is a big gap between developers and users 55

Slide 56

Slide 56 text

Graph Transformation Model Size !, Persormance ", (Accuracy !) 56

Slide 57

Slide 57 text

Freeze: Variable to Constant node { name: "x" op: "Placeholder" ... } node { name: "bias" op: "VariableV2" attr { key: "container" value { s: "" } } attr { key: "shape" value { shape { dim { size: 1 } } } } ... 57

Slide 58

Slide 58 text

Freeze: Variable to Constant » ! Train » ! Checkpoint file is created » ! Freeze $ bazel build tensorflow/python/tools:optimize_for_inference $ bazel-bin/tensorflow/python/tools/optimize_for_inference \ --input=../logic-gate/logic-gate-python/models/and.pb \ --output=../logic-gate/logic-gate-python/models/optimized_and.pb \ --frozen_graph=True \ --input_names=x \ --output_names=y_pred 58

Slide 59

Slide 59 text

Freeze: Variable to Constant node { name: "x" op: "Placeholder" ... } node { name: "bias" op: "Const" attr { key: "value" value { tensor { dtype: DT_FLOAT tensor_shape { dim { size: 1 } } float_val: -15.988138198852539 } } } ... bias became Const with a fixed float value 59

Slide 60

Slide 60 text

Before (Training) 60

Slide 61

Slide 61 text

After (Inference) 61

Slide 62

Slide 62 text

Performance » Computational Cost » Model Size 62

Slide 63

Slide 63 text

Pre-trained Models for Image Classiﬁcation » Inception, AlexNet, MobileNet, ... 63

Slide 64

Slide 64 text

Model Comparison » Retrained Inception-v3, Retrained MobileNet, PatchNet (Modiﬁed Inception-v3) » Specialized for food 64

Slide 65

Slide 65 text

https://speakerdeck.com/lunardog/cooking-with-food-photos 65

Slide 66

Slide 66 text

MobileNet ! PatchNet 66

Slide 67

Slide 67 text

Performance Comparison Retrained Inception-v3 Retrained MobileNet PatchNet (Modiﬁed Inception-v3) Model Size 77.4 MB 11.6 MB 85.9 MB Execution Time 05m 44s 166ms 01m 19s 535ms 18m 34s 624ms CPU Usage 80% 50% 80% Memory Usage 500 MB 150 MB 500 MB 67

Slide 68

Slide 68 text

Slide 69

Slide 69 text

Graph Complexity » ≈ Computational Cost » ≈ Model Size 69

Slide 70

Slide 70 text

Inception-v3, MobileNet, PatchNet 70

Slide 71

Slide 71 text

Benchmark $ bazel build tensorﬂow/tools/benchmark/benchmark_model $ bazel-bin/tensorﬂow/tools/benchmark/benchmark_model \ --graph=... \ --input_layer=... \ --input_layer_shape=... \ --input_layer_type=... \ --output_layer=... \ ... 71

Slide 72

Slide 72 text

Benchmark Retrained Inception-v3 Retrained MobileNet PatchNet Number of nodes executed 507 264 704 AVG Timings (microseconds) 368284 35551.5 1.28228e+06 Actual Execution Time 05m 44s 166ms 01m 19s 535ms 18m 34s 624ms How much faster than PatchNet 3x faster 18x faster - 72

Slide 73

Slide 73 text

Graph Transformation $ bazel build tensorflow/tools/graph_transforms:transform_graph $ bazel-bin/tensorflow/tools/graph_transforms/transform_graph \ --in_graph=../models/mobilenet.pb \ --out_graph=../models/optimized-mobilenet.pb \ --inputs='input_1:0' \ --outputs='final:0' \ --transforms=' strip_unused_nodes(type=float, shape="1,224,224,3") remove_nodes(op=Identity, op=CheckNumerics) fold_constants(ignore_errors=true) fold_batch_norms fold_old_batch_norms quantize_weights sort_by_execution_order' 73

Slide 74

Slide 74 text

Graph Transformation Original quantize_weights More Options Model Size 12 MB 3.3 MB 3.1 MB Accuracy 0.97 0.97 0.9633 Benchmark 01m 19s 535ms 01m 17s 716ms 01m 13s 816ms CPU Usage 40-60% 40-60% 40-60% Memory Usage 120 MB 90 MB 90 MB Number of nodes executed 264 264 236 ” MobileNet 74

Slide 75

Slide 75 text

Accuracy Basically, accuracy should be the same if we run the same computation graph with the same inputs but... » Computation Graph » Transformed for mobile » Data preprocessing » Vectorization » Feature Scaling 75

Slide 76

Slide 76 text

Feature Scaling Make sure that features are on the same scale. ! One particular feature which has a broad range of values governs the result " 76

Slide 77

Slide 77 text

Normalization (Number) Rescaling Mean normalization Standardization 77

Slide 78

Slide 78 text

Normalization (Color) val pixels: IntArray ! How to process this data? 78

Slide 79

Slide 79 text

Load from Asset Manager / File constructor(assets: AssetManager) { this.inferenceInterface = TensorFlowInferenceInterface(assets, modelName) } constructor(ﬁle: File) { this.inferenceInterface = ﬁle.inputStream().use { TensorFlowInferenceInterface(it) } } 79

Slide 80

Slide 80 text

Still worry about app size? » Put the model in assets (bundling) » Pros: No need to manage versions » Cons: Increased app size » Download a model into a ﬁle » Pros: Reduced app size » Cons: Need to manage versions 80

Slide 81

Slide 81 text

Still worry about library size? » Splitting APK android { ... splits { abi { enable true reset() include 'x86_64', 'x86', 'arm64-v8a', 'armeabi-v7a', 'arm64-v8a' } } project.ext.abiCodes = ['x86_64': 1, 'x86': 2, 'arm64-v8a': 3, 'armeabi-v7a': 4].withDefault { 0 } } android.applicationVariants.all { variant -> variant.outputs.each { output -> def baseAbiVersionCode = project.ext.abiCodes.get(output.getFilter(OutputFile.ABI)) if (baseAbiVersionCode != null) { output.versionCodeOverride = baseAbiVersionCode * 1000 + variant.versionCode } } } 81

Slide 82

Slide 82 text

Asynchronous download + Splitting APK ! Get power of AI by just adding 345 MB 82

Slide 83

Slide 83 text

[Summary] » How to train a model » Difference between TensorFlow Mobile and Lite » How to transform a graph ! Freeze/Optimize/ benchmark » How to implement client-side » Model is too big? ! Consider asyncronous download » APK still is too big? ! Split APK 83

Slide 84

Slide 84 text

Thank you for listening! Any questions? ! @rejasupotaro 84