ML on Mobile DroidKaigi 2018

666ef10ec14e5a23d0fcf05bd2665575?s=47 rejasupotaro
February 09, 2018

ML on Mobile DroidKaigi 2018

666ef10ec14e5a23d0fcf05bd2665575?s=128

rejasupotaro

February 09, 2018
Tweet

Transcript

  1. Ready for AI? ML on Moble » Machine Learning &

    TensorFlow » Build your own model and optimize it DroidKaigi 2018 @rejasupotaro 1
  2. » Kentaro Takiguchi / @rejasupotaro » Software Engineer » Android

    (since 2010) » ! Web » ! Search » ! ML » Discovery team at Cookpad UK 2
  3. Discovery = Search + Recommendation Provide users with the right

    contents at the right time 3
  4. Deep Learning Achievements Over The Past Year https://blog.statsbot.co/deep-learning-achievements-4c563e034257 4

  5. Artifical Intelligence, Blockchain, IoT, VR, Serverless architecture, ... 5

  6. Best Apps in 2017 Socratic, Pinterest, Faceapp 6

  7. Google Flights will now predict airline delays before the airlines

    do 7
  8. One Day » When I send a resume to a

    company, AI predicts my future performance, then they reject without having an interview 8
  9. Another Day » When I send a resume to a

    company, AI predicts my future performance, then they reject without having an interview » When I attempt to buy an insurance, AI detects a risk of my future illness, then they reject my application 9
  10. Another Day » When I send a resume to a

    company, AI predicts my future performance, then they reject without having an interview » When I attempt to buy an insurance, AI detects a risk of my future illness, then they reject my application » When I pass in front of a surveillance camera, AI detects that I will commit a crime, then I'm arrested 10
  11. ! Sounds fun ! 11

  12. ! This is the talk for you » ✅ "I'm

    just interested in AI" » Welcome! » ✅ "I've investigated the feasibility. I want to release" » This talk focuses on what's happening inside » You would be able to release your app with confidence 12
  13. API development is matured Server: "Call this API with the

    param" GET /users?id=1 Client: "Okay" » We get to know how to use it by just looking at this one line 13
  14. Productioning ML has just started ML: "I trained a model.

    Use this" awesome_model.pb Client: ??? » Machine learning researchers need to know the real environment » Product developers need to learn machine learning 14
  15. My Motivation It's like Android development around 2012 (Everything was

    difficult...) I feel we need to share knowledge. ! I want to share what I learned. 15
  16. Build This AI App Together https://github.com/rejasupotaro/logic-gate 16

  17. Why TensorFlow? » There are many libraries such as PyTorch,

    Caffe2, Theano, Chainer, ... 17
  18. Runtime We would ❤ the official way Android iOS TensorFlow

    Deeplearning4j Core ML TensorFlow ” Let's ask if the library they love supports these platforms 18
  19. Train ! (IR) ! Inference 19

  20. What is TensorFlow? “TensorFlow is an interface for expressing machine

    learning algorithms, and an implementation for executing such algorithms, focusing on a wide variety of heterogeneous systems, ranging from mobile devices up to large-scale distributed systems of hundred machines and thousands of computational devices” https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45166.pdf 20
  21. What is TensorFlow? https://www.tensorflow.org/programmers_guide/graphs 21

  22. What is TensorFlow? import tensorflow as tf a = tf.constant(2)

    b = tf.constant(3) with tf.Session() as session: print(session.run(a + b)) # => 5 22
  23. Why TensorFlow? » There are many libraries such as PyTorch,

    Caffe2, Theano, Chainer, ... » ! Most of libraries are designed for research, but TensorFlow is designed to run models in production, for various environments including mobile. 23
  24. Architecture of TensorFlow 24

  25. TensorFlow Mobile & Lite 25

  26. Arhictecture of TensorFlow Lite https://developer.android.com/ndk/guides/neuralnetworks/index.html 26

  27. Arm support for Android NNAPI gives >4x performance boost https://arxiv.org/pdf/1801.06274.pdf

    27
  28. TensorFlow Mobile (17.2 MB) TensorFlow Lite (1.5 MB) 28

  29. TensorFlow Mobile & Lite TensorFlow Mobile TensorFlow Lite 1.4.0 0.1.1

    » TensorFlow Mobile » = TensorFlow Java + Android » TensorFlow Lite » = Minimum implementation optimized for mobile and embedded devices as of 2018/02/01 29
  30. “Google will make Android the best platform for machine learning”

    ૉ੖Β͍͠NDKͷੈք / Building High Performance Android Apps with NDK ʵ 2018/02/08 15:40-16:30 30
  31. Let's See How to run a trained model 31

  32. Run a trained model » Add the dependency. implementation "org.tensorflow:tensorflow-android:1.4.0"

    » Put a model in src/main/assets » Run the computation graph inferenceInterface.feed(inputName, input, shape.first, shape.second) inferenceInterface.run(arrayOf(outputName)) inferenceInterface.fetch(outputName, output) » Give [0, 1] ! Get [0] 32
  33. That's it! » It can be done within 5 min

    if I have a trained model » You don't need to write C++. You don't need to use CMake. Just use the library. 33
  34. Logic AND Gate Input Output 0, 0 0 0, 1

    0 1, 0 0 1, 1 1 34
  35. » , : input » : output » , :

    variables » Fomula: 35
  36. Let's define a graph 36

  37. x = tf.placeholder(tf.float32, shape=[None, 2], name='x') y = tf.placeholder(tf.float32, shape=[None,

    1], name='y') w = tf.Variable(tf.zeros([2, 1]), name='weight') b = tf.Variable(tf.zeros([1]), name='bias') y_pred = tf.nn.sigmoid(tf.matmul(x, w) + bias, name='y_pred') with tf.name_scope("loss"): loss = tf.reduce_sum(tf.square(y_pred - y), name='loss') with tf.name_scope("train"): optimizer = tf.train.AdamOptimizer(learning_rate=0.1, name='optimizer') train_step = optimizer.minimize(loss, name='train_step') with tf.Session() as session: session.run(tf.global_variables_initializer()) for epoch in range(self.args.epochs): _, summary, l = session.run( [train_step, merged, loss], feed_dict={ x: input, y: output } ) 37
  38. x = tf.placeholder(tf.float32, shape=[None, 2], name='x') y = tf.placeholder(tf.float32, shape=[None,

    1], name='y') w = tf.Variable(tf.zeros([2, 1]), name='weight') b = tf.Variable(tf.zeros([1]), name='bias') y_pred = tf.nn.sigmoid(tf.matmul(x, w) + bias, name='y_pred') with tf.name_scope("loss"): loss = tf.reduce_sum(tf.square(y_pred - y), name='loss') with tf.name_scope("train"): optimizer = tf.train.AdamOptimizer(learning_rate=0.1, name='optimizer') train_step = optimizer.minimize(loss, name='train_step') with tf.Session() as session: session.run(tf.global_variables_initializer()) for epoch in range(self.args.epochs): _, summary, l = session.run( [train_step, merged, loss], feed_dict={ x: input, y: output } ) 38
  39. x = tf.placeholder(tf.float32, shape=[None, 2], name='x') y = tf.placeholder(tf.float32, shape=[None,

    1], name='y') w = tf.Variable(tf.zeros([2, 1]), name='weight') b = tf.Variable(tf.zeros([1]), name='bias') y_pred = tf.nn.sigmoid(tf.matmul(x, w) + bias, name='y_pred') with tf.name_scope("loss"): loss = tf.reduce_sum(tf.square(y_pred - y), name='loss') with tf.name_scope("train"): optimizer = tf.train.AdamOptimizer(learning_rate=0.1, name='optimizer') train_step = optimizer.minimize(loss, name='train_step') with tf.Session() as session: session.run(tf.global_variables_initializer()) for epoch in range(self.args.epochs): _, summary, l = session.run( [train_step, merged, loss], feed_dict={ x: input, y: output } ) 39
  40. Training » Define the problem (inputs and expected outputs) »

    Build the computation graph » : tf.placeholder » , : tf.Variable » : tf.nn.sigmoid(tf.matmul(x, w) + b) » Find the appropriate values by iterating training steps 40
  41. Visualizing a Model 41

  42. Congrats! ! » Once you understand how model is built,

    you can build any models (by modifying the graph a little) » You don't need to hardcode the logic. Machine learns how to do it instead of you. 42
  43. Hardcode Logic fun predict(x1: Int, x2: Int): Int { return

    if (x1 == 0 && x2 == 0) { 0 } else if (x1 == 1 && x2 == 0) { 0 } else if (x1 == 0 && x2 == 1) { 0 } else if (x1 == 1 && x2 == 1) { 1 } else { throw RuntimeException("I hope it won't happen") } } ” List all possible cases 43
  44. Machine Learns // [[0, 0], [0, 1], [1, 0], [1,

    1]] fun predict(x1: Int, x2: Int): Int { ??? // Machine learns what value to return } // => [[0], [0], [0], [1]] ” You don't need to write the logic 44
  45. XOR Gate // [[0, 0], [0, 1], [1, 0], [1,

    1]] fun predict(x1: Int, x2: Int): Int { ??? // Machine learns what value to return } // => [[0], [1], [1], [0]] Nest more layers when we work on a complex problem 45
  46. Estimate Weight // [170cm, Man, ...] fun predict(...): Int {

    ??? // Machine learns what value to return } // => 70kg 46
  47. Translation » "I am learning machine learning" » ! ["i",

    "am", "learning", "machine", "learning"] » ! [0.53177921, 0.7965138, 0.66959208, ...] » ! ["ࢲ", "͸", "ػցֶश", "Λ", "ษڧ", "͠", "ͯ", "͍", "·͢"] » ! "ࢲ͸ػցֶशΛษڧ͍ͯ͠·͢" 47
  48. Image Classification » [#000000, #00001A, ...] » ! [[0, 0,

    0], [0, 0, 0.1], ...] » ! "Cat" or "Burrito" 48
  49. Deploy to Production » Optimize Model » Deploy Model 49

  50. Practically, we don't run such a graph in production 50

  51. Unnecessary nodes + No need to be variables 51

  52. Training » Define a problem (inputs and expected outputs) »

    Build a computation graph » Find the appropriate values by iterating training steps Inference » Get the predicted value 52
  53. Why we need to transform graph 1.To enable the model

    to run on mobile » Supported operations are limited 2.For better performance » Computer resources are restricted » Shouldn't drain users' phone battery 53
  54. 1. To enable the model to run on mobile $

    find ./tensorflow/python/ -type f | grep '.py$' | wc -l 968 $ find ./tensorflow/java/ -type f | grep '.java$' | wc -l 42 Feature Python Java Run a predefined Graph tf.import_graph_def, tf.Session TensorFlowInferenceInterface Graph construction Yes Gradients tf.gradients Functions tf.python.framework.function.Defun Control Flow tf.cond, tf.while_loop Neural Network library tf.train, tf.nn, tf.contrib.layers, tf.contrib.slim 54
  55. 2. For better performance Pixel 2 XL (my phone) Oppo

    A37f (most used) Chipset Qualcomm MSM8996 Snapdragon 821 Qualcomm MSM8916 Snapdragon 410 CPU Quad-core (2x2.15 GHz Kryo & 2x1.6 GHz Kryo) Quad-core 1.2 GHz Cortex-A53 GPU Adreno 530 Adreno 306 RAM 4 GB 2 GB There is a big gap between developers and users 55
  56. Graph Transformation Model Size !, Persormance ", (Accuracy !) 56

  57. Freeze: Variable to Constant node { name: "x" op: "Placeholder"

    ... } node { name: "bias" op: "VariableV2" attr { key: "container" value { s: "" } } attr { key: "shape" value { shape { dim { size: 1 } } } } ... 57
  58. Freeze: Variable to Constant » ! Train » ! Checkpoint

    file is created » ! Freeze $ bazel build tensorflow/python/tools:optimize_for_inference $ bazel-bin/tensorflow/python/tools/optimize_for_inference \ --input=../logic-gate/logic-gate-python/models/and.pb \ --output=../logic-gate/logic-gate-python/models/optimized_and.pb \ --frozen_graph=True \ --input_names=x \ --output_names=y_pred 58
  59. Freeze: Variable to Constant node { name: "x" op: "Placeholder"

    ... } node { name: "bias" op: "Const" attr { key: "value" value { tensor { dtype: DT_FLOAT tensor_shape { dim { size: 1 } } float_val: -15.988138198852539 } } } ... bias became Const with a fixed float value 59
  60. Before (Training) 60

  61. After (Inference) 61

  62. Performance » Computational Cost » Model Size 62

  63. Pre-trained Models for Image Classification » Inception, AlexNet, MobileNet, ...

    63
  64. Model Comparison » Retrained Inception-v3, Retrained MobileNet, PatchNet (Modified Inception-v3)

    » Specialized for food 64
  65. https://speakerdeck.com/lunardog/cooking-with-food-photos 65

  66. MobileNet ! PatchNet 66

  67. Performance Comparison Retrained Inception-v3 Retrained MobileNet PatchNet (Modified Inception-v3) Model

    Size 77.4 MB 11.6 MB 85.9 MB Execution Time 05m 44s 166ms 01m 19s 535ms 18m 34s 624ms CPU Usage 80% 50% 80% Memory Usage 500 MB 150 MB 500 MB 67
  68. 68

  69. Graph Complexity » ≈ Computational Cost » ≈ Model Size

    69
  70. Inception-v3, MobileNet, PatchNet 70

  71. Benchmark $ bazel build tensorflow/tools/benchmark/benchmark_model $ bazel-bin/tensorflow/tools/benchmark/benchmark_model \ --graph=... \

    --input_layer=... \ --input_layer_shape=... \ --input_layer_type=... \ --output_layer=... \ ... 71
  72. Benchmark Retrained Inception-v3 Retrained MobileNet PatchNet Number of nodes executed

    507 264 704 AVG Timings (microseconds) 368284 35551.5 1.28228e+06 Actual Execution Time 05m 44s 166ms 01m 19s 535ms 18m 34s 624ms How much faster than PatchNet 3x faster 18x faster - 72
  73. Graph Transformation $ bazel build tensorflow/tools/graph_transforms:transform_graph $ bazel-bin/tensorflow/tools/graph_transforms/transform_graph \ --in_graph=../models/mobilenet.pb

    \ --out_graph=../models/optimized-mobilenet.pb \ --inputs='input_1:0' \ --outputs='final:0' \ --transforms=' strip_unused_nodes(type=float, shape="1,224,224,3") remove_nodes(op=Identity, op=CheckNumerics) fold_constants(ignore_errors=true) fold_batch_norms fold_old_batch_norms quantize_weights sort_by_execution_order' 73
  74. Graph Transformation Original quantize_weights More Options Model Size 12 MB

    3.3 MB 3.1 MB Accuracy 0.97 0.97 0.9633 Benchmark 01m 19s 535ms 01m 17s 716ms 01m 13s 816ms CPU Usage 40-60% 40-60% 40-60% Memory Usage 120 MB 90 MB 90 MB Number of nodes executed 264 264 236 ” MobileNet 74
  75. Accuracy Basically, accuracy should be the same if we run

    the same computation graph with the same inputs but... » Computation Graph » Transformed for mobile » Data preprocessing » Vectorization » Feature Scaling 75
  76. Feature Scaling Make sure that features are on the same

    scale. ! One particular feature which has a broad range of values governs the result " 76
  77. Normalization (Number) Rescaling Mean normalization Standardization 77

  78. Normalization (Color) val pixels: IntArray ! How to process this

    data? 78
  79. Load from Asset Manager / File constructor(assets: AssetManager) { this.inferenceInterface

    = TensorFlowInferenceInterface(assets, modelName) } constructor(file: File) { this.inferenceInterface = file.inputStream().use { TensorFlowInferenceInterface(it) } } 79
  80. Still worry about app size? » Put the model in

    assets (bundling) » Pros: No need to manage versions » Cons: Increased app size » Download a model into a file » Pros: Reduced app size » Cons: Need to manage versions 80
  81. Still worry about library size? » Splitting APK android {

    ... splits { abi { enable true reset() include 'x86_64', 'x86', 'arm64-v8a', 'armeabi-v7a', 'arm64-v8a' } } project.ext.abiCodes = ['x86_64': 1, 'x86': 2, 'arm64-v8a': 3, 'armeabi-v7a': 4].withDefault { 0 } } android.applicationVariants.all { variant -> variant.outputs.each { output -> def baseAbiVersionCode = project.ext.abiCodes.get(output.getFilter(OutputFile.ABI)) if (baseAbiVersionCode != null) { output.versionCodeOverride = baseAbiVersionCode * 1000 + variant.versionCode } } } 81
  82. Asynchronous download + Splitting APK ! Get power of AI

    by just adding 345 MB 82
  83. [Summary] » How to train a model » Difference between

    TensorFlow Mobile and Lite » How to transform a graph ! Freeze/Optimize/ benchmark » How to implement client-side » Model is too big? ! Consider asyncronous download » APK still is too big? ! Split APK 83
  84. Thank you for listening! Any questions? ! @rejasupotaro 84