Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ML on Mobile DroidKaigi 2018

ML on Mobile DroidKaigi 2018

rejasupotaro

February 09, 2018
Tweet

More Decks by rejasupotaro

Other Decks in Programming

Transcript

  1. Ready for AI? ML on Moble » Machine Learning &

    TensorFlow » Build your own model and optimize it DroidKaigi 2018 @rejasupotaro 1
  2. » Kentaro Takiguchi / @rejasupotaro » Software Engineer » Android

    (since 2010) » ! Web » ! Search » ! ML » Discovery team at Cookpad UK 2
  3. One Day » When I send a resume to a

    company, AI predicts my future performance, then they reject without having an interview 8
  4. Another Day » When I send a resume to a

    company, AI predicts my future performance, then they reject without having an interview » When I attempt to buy an insurance, AI detects a risk of my future illness, then they reject my application 9
  5. Another Day » When I send a resume to a

    company, AI predicts my future performance, then they reject without having an interview » When I attempt to buy an insurance, AI detects a risk of my future illness, then they reject my application » When I pass in front of a surveillance camera, AI detects that I will commit a crime, then I'm arrested 10
  6. ! This is the talk for you » ✅ "I'm

    just interested in AI" » Welcome! » ✅ "I've investigated the feasibility. I want to release" » This talk focuses on what's happening inside » You would be able to release your app with confidence 12
  7. API development is matured Server: "Call this API with the

    param" GET /users?id=1 Client: "Okay" » We get to know how to use it by just looking at this one line 13
  8. Productioning ML has just started ML: "I trained a model.

    Use this" awesome_model.pb Client: ??? » Machine learning researchers need to know the real environment » Product developers need to learn machine learning 14
  9. My Motivation It's like Android development around 2012 (Everything was

    difficult...) I feel we need to share knowledge. ! I want to share what I learned. 15
  10. Runtime We would ❤ the official way Android iOS TensorFlow

    Deeplearning4j Core ML TensorFlow ” Let's ask if the library they love supports these platforms 18
  11. What is TensorFlow? “TensorFlow is an interface for expressing machine

    learning algorithms, and an implementation for executing such algorithms, focusing on a wide variety of heterogeneous systems, ranging from mobile devices up to large-scale distributed systems of hundred machines and thousands of computational devices” https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45166.pdf 20
  12. What is TensorFlow? import tensorflow as tf a = tf.constant(2)

    b = tf.constant(3) with tf.Session() as session: print(session.run(a + b)) # => 5 22
  13. Why TensorFlow? » There are many libraries such as PyTorch,

    Caffe2, Theano, Chainer, ... » ! Most of libraries are designed for research, but TensorFlow is designed to run models in production, for various environments including mobile. 23
  14. TensorFlow Mobile & Lite TensorFlow Mobile TensorFlow Lite 1.4.0 0.1.1

    » TensorFlow Mobile » = TensorFlow Java + Android » TensorFlow Lite » = Minimum implementation optimized for mobile and embedded devices as of 2018/02/01 29
  15. “Google will make Android the best platform for machine learning”

    ૉ੖Β͍͠NDKͷੈք / Building High Performance Android Apps with NDK ʵ 2018/02/08 15:40-16:30 30
  16. Run a trained model » Add the dependency. implementation "org.tensorflow:tensorflow-android:1.4.0"

    » Put a model in src/main/assets » Run the computation graph inferenceInterface.feed(inputName, input, shape.first, shape.second) inferenceInterface.run(arrayOf(outputName)) inferenceInterface.fetch(outputName, output) » Give [0, 1] ! Get [0] 32
  17. That's it! » It can be done within 5 min

    if I have a trained model » You don't need to write C++. You don't need to use CMake. Just use the library. 33
  18. » , : input » : output » , :

    variables » Fomula: 35
  19. x = tf.placeholder(tf.float32, shape=[None, 2], name='x') y = tf.placeholder(tf.float32, shape=[None,

    1], name='y') w = tf.Variable(tf.zeros([2, 1]), name='weight') b = tf.Variable(tf.zeros([1]), name='bias') y_pred = tf.nn.sigmoid(tf.matmul(x, w) + bias, name='y_pred') with tf.name_scope("loss"): loss = tf.reduce_sum(tf.square(y_pred - y), name='loss') with tf.name_scope("train"): optimizer = tf.train.AdamOptimizer(learning_rate=0.1, name='optimizer') train_step = optimizer.minimize(loss, name='train_step') with tf.Session() as session: session.run(tf.global_variables_initializer()) for epoch in range(self.args.epochs): _, summary, l = session.run( [train_step, merged, loss], feed_dict={ x: input, y: output } ) 37
  20. x = tf.placeholder(tf.float32, shape=[None, 2], name='x') y = tf.placeholder(tf.float32, shape=[None,

    1], name='y') w = tf.Variable(tf.zeros([2, 1]), name='weight') b = tf.Variable(tf.zeros([1]), name='bias') y_pred = tf.nn.sigmoid(tf.matmul(x, w) + bias, name='y_pred') with tf.name_scope("loss"): loss = tf.reduce_sum(tf.square(y_pred - y), name='loss') with tf.name_scope("train"): optimizer = tf.train.AdamOptimizer(learning_rate=0.1, name='optimizer') train_step = optimizer.minimize(loss, name='train_step') with tf.Session() as session: session.run(tf.global_variables_initializer()) for epoch in range(self.args.epochs): _, summary, l = session.run( [train_step, merged, loss], feed_dict={ x: input, y: output } ) 38
  21. x = tf.placeholder(tf.float32, shape=[None, 2], name='x') y = tf.placeholder(tf.float32, shape=[None,

    1], name='y') w = tf.Variable(tf.zeros([2, 1]), name='weight') b = tf.Variable(tf.zeros([1]), name='bias') y_pred = tf.nn.sigmoid(tf.matmul(x, w) + bias, name='y_pred') with tf.name_scope("loss"): loss = tf.reduce_sum(tf.square(y_pred - y), name='loss') with tf.name_scope("train"): optimizer = tf.train.AdamOptimizer(learning_rate=0.1, name='optimizer') train_step = optimizer.minimize(loss, name='train_step') with tf.Session() as session: session.run(tf.global_variables_initializer()) for epoch in range(self.args.epochs): _, summary, l = session.run( [train_step, merged, loss], feed_dict={ x: input, y: output } ) 39
  22. Training » Define the problem (inputs and expected outputs) »

    Build the computation graph » : tf.placeholder » , : tf.Variable » : tf.nn.sigmoid(tf.matmul(x, w) + b) » Find the appropriate values by iterating training steps 40
  23. Congrats! ! » Once you understand how model is built,

    you can build any models (by modifying the graph a little) » You don't need to hardcode the logic. Machine learns how to do it instead of you. 42
  24. Hardcode Logic fun predict(x1: Int, x2: Int): Int { return

    if (x1 == 0 && x2 == 0) { 0 } else if (x1 == 1 && x2 == 0) { 0 } else if (x1 == 0 && x2 == 1) { 0 } else if (x1 == 1 && x2 == 1) { 1 } else { throw RuntimeException("I hope it won't happen") } } ” List all possible cases 43
  25. Machine Learns // [[0, 0], [0, 1], [1, 0], [1,

    1]] fun predict(x1: Int, x2: Int): Int { ??? // Machine learns what value to return } // => [[0], [0], [0], [1]] ” You don't need to write the logic 44
  26. XOR Gate // [[0, 0], [0, 1], [1, 0], [1,

    1]] fun predict(x1: Int, x2: Int): Int { ??? // Machine learns what value to return } // => [[0], [1], [1], [0]] Nest more layers when we work on a complex problem 45
  27. Estimate Weight // [170cm, Man, ...] fun predict(...): Int {

    ??? // Machine learns what value to return } // => 70kg 46
  28. Translation » "I am learning machine learning" » ! ["i",

    "am", "learning", "machine", "learning"] » ! [0.53177921, 0.7965138, 0.66959208, ...] » ! ["ࢲ", "͸", "ػցֶश", "Λ", "ษڧ", "͠", "ͯ", "͍", "·͢"] » ! "ࢲ͸ػցֶशΛษڧ͍ͯ͠·͢" 47
  29. Image Classification » [#000000, #00001A, ...] » ! [[0, 0,

    0], [0, 0, 0.1], ...] » ! "Cat" or "Burrito" 48
  30. Training » Define a problem (inputs and expected outputs) »

    Build a computation graph » Find the appropriate values by iterating training steps Inference » Get the predicted value 52
  31. Why we need to transform graph 1.To enable the model

    to run on mobile » Supported operations are limited 2.For better performance » Computer resources are restricted » Shouldn't drain users' phone battery 53
  32. 1. To enable the model to run on mobile $

    find ./tensorflow/python/ -type f | grep '.py$' | wc -l 968 $ find ./tensorflow/java/ -type f | grep '.java$' | wc -l 42 Feature Python Java Run a predefined Graph tf.import_graph_def, tf.Session TensorFlowInferenceInterface Graph construction Yes Gradients tf.gradients Functions tf.python.framework.function.Defun Control Flow tf.cond, tf.while_loop Neural Network library tf.train, tf.nn, tf.contrib.layers, tf.contrib.slim 54
  33. 2. For better performance Pixel 2 XL (my phone) Oppo

    A37f (most used) Chipset Qualcomm MSM8996 Snapdragon 821 Qualcomm MSM8916 Snapdragon 410 CPU Quad-core (2x2.15 GHz Kryo & 2x1.6 GHz Kryo) Quad-core 1.2 GHz Cortex-A53 GPU Adreno 530 Adreno 306 RAM 4 GB 2 GB There is a big gap between developers and users 55
  34. Freeze: Variable to Constant node { name: "x" op: "Placeholder"

    ... } node { name: "bias" op: "VariableV2" attr { key: "container" value { s: "" } } attr { key: "shape" value { shape { dim { size: 1 } } } } ... 57
  35. Freeze: Variable to Constant » ! Train » ! Checkpoint

    file is created » ! Freeze $ bazel build tensorflow/python/tools:optimize_for_inference $ bazel-bin/tensorflow/python/tools/optimize_for_inference \ --input=../logic-gate/logic-gate-python/models/and.pb \ --output=../logic-gate/logic-gate-python/models/optimized_and.pb \ --frozen_graph=True \ --input_names=x \ --output_names=y_pred 58
  36. Freeze: Variable to Constant node { name: "x" op: "Placeholder"

    ... } node { name: "bias" op: "Const" attr { key: "value" value { tensor { dtype: DT_FLOAT tensor_shape { dim { size: 1 } } float_val: -15.988138198852539 } } } ... bias became Const with a fixed float value 59
  37. Performance Comparison Retrained Inception-v3 Retrained MobileNet PatchNet (Modified Inception-v3) Model

    Size 77.4 MB 11.6 MB 85.9 MB Execution Time 05m 44s 166ms 01m 19s 535ms 18m 34s 624ms CPU Usage 80% 50% 80% Memory Usage 500 MB 150 MB 500 MB 67
  38. 68

  39. Benchmark $ bazel build tensorflow/tools/benchmark/benchmark_model $ bazel-bin/tensorflow/tools/benchmark/benchmark_model \ --graph=... \

    --input_layer=... \ --input_layer_shape=... \ --input_layer_type=... \ --output_layer=... \ ... 71
  40. Benchmark Retrained Inception-v3 Retrained MobileNet PatchNet Number of nodes executed

    507 264 704 AVG Timings (microseconds) 368284 35551.5 1.28228e+06 Actual Execution Time 05m 44s 166ms 01m 19s 535ms 18m 34s 624ms How much faster than PatchNet 3x faster 18x faster - 72
  41. Graph Transformation $ bazel build tensorflow/tools/graph_transforms:transform_graph $ bazel-bin/tensorflow/tools/graph_transforms/transform_graph \ --in_graph=../models/mobilenet.pb

    \ --out_graph=../models/optimized-mobilenet.pb \ --inputs='input_1:0' \ --outputs='final:0' \ --transforms=' strip_unused_nodes(type=float, shape="1,224,224,3") remove_nodes(op=Identity, op=CheckNumerics) fold_constants(ignore_errors=true) fold_batch_norms fold_old_batch_norms quantize_weights sort_by_execution_order' 73
  42. Graph Transformation Original quantize_weights More Options Model Size 12 MB

    3.3 MB 3.1 MB Accuracy 0.97 0.97 0.9633 Benchmark 01m 19s 535ms 01m 17s 716ms 01m 13s 816ms CPU Usage 40-60% 40-60% 40-60% Memory Usage 120 MB 90 MB 90 MB Number of nodes executed 264 264 236 ” MobileNet 74
  43. Accuracy Basically, accuracy should be the same if we run

    the same computation graph with the same inputs but... » Computation Graph » Transformed for mobile » Data preprocessing » Vectorization » Feature Scaling 75
  44. Feature Scaling Make sure that features are on the same

    scale. ! One particular feature which has a broad range of values governs the result " 76
  45. Load from Asset Manager / File constructor(assets: AssetManager) { this.inferenceInterface

    = TensorFlowInferenceInterface(assets, modelName) } constructor(file: File) { this.inferenceInterface = file.inputStream().use { TensorFlowInferenceInterface(it) } } 79
  46. Still worry about app size? » Put the model in

    assets (bundling) » Pros: No need to manage versions » Cons: Increased app size » Download a model into a file » Pros: Reduced app size » Cons: Need to manage versions 80
  47. Still worry about library size? » Splitting APK android {

    ... splits { abi { enable true reset() include 'x86_64', 'x86', 'arm64-v8a', 'armeabi-v7a', 'arm64-v8a' } } project.ext.abiCodes = ['x86_64': 1, 'x86': 2, 'arm64-v8a': 3, 'armeabi-v7a': 4].withDefault { 0 } } android.applicationVariants.all { variant -> variant.outputs.each { output -> def baseAbiVersionCode = project.ext.abiCodes.get(output.getFilter(OutputFile.ABI)) if (baseAbiVersionCode != null) { output.versionCodeOverride = baseAbiVersionCode * 1000 + variant.versionCode } } } 81
  48. [Summary] » How to train a model » Difference between

    TensorFlow Mobile and Lite » How to transform a graph ! Freeze/Optimize/ benchmark » How to implement client-side » Model is too big? ! Consider asyncronous download » APK still is too big? ! Split APK 83